DuckDB - Retrieving MOT test data
Table of Contents
Using the Open Data Anonymised MOT tests and results to try out DuckDB working with ’large’ data locally.
First step is to download all the compressed files from the Open Data website. A few minutes later and there’s over 25GB of files. Here’s the resulting files…
-rw-rw-r-- 1 simon simon 451M Jan 14 15:36 dft_test_item_2017.zip
-rw-rw-r-- 1 simon simon 366M Jan 14 15:34 dft_test_item_2018.zip
-rw-rw-r-- 1 simon simon 394M Jan 14 15:34 dft_test_item_2019.zip
-rw-rw-r-- 1 simon simon 367M Jan 14 15:34 dft_test_item_2020.zip
-rw-rw-r-- 1 simon simon 602M Jan 14 15:38 dft_test_item_2021.zip
-rw-rw-r-- 1 simon simon 416M Jan 14 15:34 dft_test_item_2022.zip
-rw-rw-r-- 1 simon simon 439M Jan 14 15:22 dft_test_item_2023.zip
-rw-rw-r-- 1 simon simon 1.1G Jan 14 15:46 dft_test_result_2017.zip
-rw-rw-r-- 1 simon simon 1.1G Jan 14 15:46 dft_test_result_2018.zip
-rw-rw-r-- 1 simon simon 1.1G Jan 14 15:46 dft_test_result_2019.zip
-rw-rw-r-- 1 simon simon 1.1G Jan 14 15:46 dft_test_result_2020.zip
-rw-rw-r-- 1 simon simon 1.2G Jan 14 15:46 dft_test_result_2021.zip
-rw-rw-r-- 1 simon simon 1.1G Jan 14 15:46 dft_test_result_2022.zip
-rw-rw-r-- 1 simon simon 1.2G Jan 8 15:09 dft_test_result_2023.zip
-rw-rw-r-- 1 simon simon 249K Mar 8 11:00 lookup.zip
-rw-rw-r-- 1 simon simon 47M Jan 14 15:23 test_item_2005.txt.gz
-rw-rw-r-- 1 simon simon 218M Jan 14 15:30 test_item_2006.txt.gz
-rw-rw-r-- 1 simon simon 252M Jan 14 15:31 test_item_2007.txt.gz
-rw-rw-r-- 1 simon simon 283M Jan 14 15:32 test_item_2008.txt.gz
-rw-rw-r-- 1 simon simon 309M Jan 14 15:33 test_item_2009.txt.gz
-rw-rw-r-- 1 simon simon 313M Jan 14 15:33 test_item_2010.txt.gz
-rw-rw-r-- 1 simon simon 323M Jan 14 15:33 test_item_2011.txt.gz
-rw-rw-r-- 1 simon simon 335M Jan 14 15:33 test_item_2012.txt.gz
-rw-rw-r-- 1 simon simon 348M Jan 14 15:33 test_item_2013.txt.gz
-rw-rw-r-- 1 simon simon 347M Jan 14 15:33 test_item_2014.txt.gz
-rw-rw-r-- 1 simon simon 332M Jan 14 15:33 test_item_2015.txt.gz
-rw-rw-r-- 1 simon simon 333M Jan 14 15:33 test_item_2016.txt.gz
-rw-rw-r-- 1 simon simon 205M Jan 14 15:29 test_result_2005.txt.gz
-rw-rw-r-- 1 simon simon 874M Jan 14 15:44 test_result_2006.txt.gz
-rw-rw-r-- 1 simon simon 917M Jan 14 15:45 test_result_2007.txt.gz
-rw-rw-r-- 1 simon simon 937M Jan 14 15:45 test_result_2008.txt.gz
-rw-rw-r-- 1 simon simon 957M Jan 14 15:45 test_result_2009.txt.gz
-rw-rw-r-- 1 simon simon 972M Jan 14 15:46 test_result_2010.txt.gz
-rw-rw-r-- 1 simon simon 992M Jan 14 15:46 test_result_2011.txt.gz
-rw-rw-r-- 1 simon simon 993M Jan 14 15:46 test_result_2012.txt.gz
-rw-rw-r-- 1 simon simon 1009M Jan 14 15:46 test_result_2013.txt.gz
-rw-rw-r-- 1 simon simon 1016M Jan 14 15:46 test_result_2014.txt.gz
-rw-rw-r-- 1 simon simon 1.0G Jan 14 15:46 test_result_2015.txt.gz
-rw-rw-r-- 1 simon simon 1.1G Jan 14 15:46 test_result_2016.txt.gz
Next extracting each of the files and checking the data. Initially focus on just the test results, will return to the failure items later.
2023 #
simon@NUC:~/Documents/mot_data$ unzip dft_test_result_2023.zip
Archive: dft_test_result_2023.zip
inflating: test_result.csv
simon@NUC:~/Documents/mot_data$ mv test_result.csv test_result_2023.csv
simon@NUC:~/Documents/mot_data$ ls -l test_result_2023.csv
-rw-rw-r-- 1 simon simon 3661104239 Feb 5 2024 test_result_2023.csv
simon@NUC:~/Documents/mot_data$ wc -l test_result_2023.csv
42216722 test_result_2023.csv
simon@NUC:~/Documents/mot_data$ head test_result_2023.csv
test_id|vehicle_id|test_date|test_class_id|test_type|test_result|test_mileage|postcode_area|make|model|colour|fuel_type|cylinder_capacity|first_use_date
1994821045|838565361|2023-01-02|4|NT|P|179357|NW|TOYOTA|PRIUS +|WHITE|HY|1798|2016-06-17
358005195|484499974|2023-01-01|4|NT|P|300072|B|TOYOTA|PRIUS|RED|HY|1500|2008-09-13
773392437|53988366|2023-01-02|4|NT|PRS|307888|HA|TOYOTA|PRIUS|GREY|HY|1497|2010-01-15
133665147|606755010|2023-01-02|4|NT|F|65810|SE|TOYOTA|PRIUS|SILVER|HY|1497|2007-03-28
656743571|606755010|2023-01-02|4|RT|P|65810|SE|TOYOTA|PRIUS|SILVER|HY|1497|2007-03-28
607277335|1307416223|2023-01-02|4|NT|P|211242|NW|TOYOTA|PRIUS|PURPLE|HY|1790|2016-02-01
1779040733|984166795|2023-01-02|4|NT|P|150344|UB|TOYOTA|PRIUS|WHITE|HY|1797|2019-12-20
234737553|21541545|2023-01-02|4|NT|P|28649|HA|TOYOTA|PRIUS|BLUE|HY|1797|2020-07-01
1125876917|1074624265|2023-01-02|4|NT|P|98679|E|TOYOTA|PRIUS|RED|HY|1798|2016-11-18
That’s 3.6GB of csv with 42 million lines, and a quick look at the first few rows and the data looks good, now to repeat for 2022.
2022 #
simon@NUC:~/Documents/mot_data$ unzip dft_test_result_2022.zip
Archive: dft_test_result_2022.zip
inflating: test_result_2022.csv
simon@NUC:~/Documents/mot_data$ ls -lh test_result_2022.csv
-rw-rw-r-- 1 simon simon 3.4G Dec 8 2023 test_result_2022.csv
simon@NUC:~/Documents/mot_data$ wc -l test_result_2022.csv
41632879 test_result_2022.csv
simon@NUC:~/Documents/mot_data$ head test_result_2022.csv
test_id|vehicle_id|test_date|test_class_id|test_type|test_result|test_mileage|postcode_area|make|model|colour|fuel_type|cylinder_capacity|first_use_date
334683447|634775234|2022-01-01|4|NT|P|227219|E|TOYOTA|PRIUS|SILVER|HY|1497|2008-01-17
586095521|1220215709|2022-01-01|4|NT|P|136552|CR|TOYOTA|PRIUS|GREY|HY|1798|2013-11-29
960974211|1315791989|2022-01-01|4|NT|F|129847|E|TOYOTA|PRIUS|WHITE|HY|1798|2018-01-01
1041792341|1144451355|2022-01-01|4|NT|P|123133|TW|TOYOTA|PRIUS|SILVER|HY|1496|2016-11-21
1587264975|1315791989|2022-01-01|4|RT|P|129848|E|TOYOTA|PRIUS|WHITE|HY|1798|2018-01-01
1032834657|1310098304|2022-01-01|4|NT|PRS|238117|IG|TOYOTA|PRIUS|BLACK|HY|1798|2012-06-29
51919479|483214935|2022-01-01|4|NT|P|110322|E|TOYOTA|PRIUS|SILVER|HY|1800|2017-04-01
1616476935|1262912232|2022-01-01|4|NT|P|161933|BD|TOYOTA|PRIUS|SILVER|HY|1497|2005-12-30
640040599|302221893|2022-01-01|4|NT|P|47101|IG|TOYOTA|PRIUS|SILVER|HY|1797|2016-06-28
File naming doesn’t seem to be consistent between years, this time the file name has the year appended. This file is 3.4G, with 41 million lines.
2021 #
simon@NUC:~/Documents/mot_data$ unzip dft_test_result_2021.zip
Archive: dft_test_result_2021.zip
inflating: test_result_2022/test_result_20220531131730_32355.csv
inflating: test_result_2022/test_result_20220531131730_32357.csv
inflating: test_result_2022/test_result_20220531131730_32360.csv
inflating: test_result_2022/test_result_20220531131730_32361.csv
inflating: test_result_2022/test_result_20220531131730_32365.csv
inflating: test_result_2022/test_result_20220531131730_32367.csv
inflating: test_result_2022/test_result_20220531131730_32370.csv
inflating: test_result_2022/test_result_20220531131730_32372.csv
inflating: test_result_2022/test_result_20220531131730_32375.csv
inflating: test_result_2022/test_result_20220531131730_32378.csv
inflating: test_result_2022/test_result_20220531131730_32384.csv
inflating: test_result_2022/test_result_20220531131730_32386.csv
simon@NUC:~/Documents/mot_data$ cd test_result_2022/
simon@NUC:~/Documents/mot_data/test_result_2022$ ls -lh
total 4.2G
-rw-r--r-- 1 simon simon 354M May 31 2022 test_result_20220531131730_32355.csv
-rw-r--r-- 1 simon simon 354M May 31 2022 test_result_20220531131730_32357.csv
-rw-r--r-- 1 simon simon 354M May 31 2022 test_result_20220531131730_32360.csv
-rw-r--r-- 1 simon simon 354M May 31 2022 test_result_20220531131730_32361.csv
-rw-r--r-- 1 simon simon 354M May 31 2022 test_result_20220531131730_32365.csv
-rw-r--r-- 1 simon simon 354M May 31 2022 test_result_20220531131730_32367.csv
-rw-r--r-- 1 simon simon 354M May 31 2022 test_result_20220531131730_32370.csv
-rw-r--r-- 1 simon simon 354M May 31 2022 test_result_20220531131730_32372.csv
-rw-r--r-- 1 simon simon 354M May 31 2022 test_result_20220531131730_32375.csv
-rw-r--r-- 1 simon simon 355M May 31 2022 test_result_20220531131730_32378.csv
-rw-r--r-- 1 simon simon 354M May 31 2022 test_result_20220531131730_32384.csv
-rw-r--r-- 1 simon simon 354M May 31 2022 test_result_20220531131730_32386.csv
OK, so dft_test_result_2021.zip creates a folder and multiple files labelled 2022, let’s check their contents…
simon@NUC:~/Documents/mot_data/test_result_2022$ head *
==> test_result_20220531131730_32355.csv <==
"test_id","vehicle_id","test_date","test_class_id","test_type","test_result","test_mileage","postcode_area","make","model","colour","fuel_type","cylinder_capacity","first_use_date"
1488085241,298646303,"2021-01-01","4","NT","P","113094","PO","VOLKSWAGEN","CADDY","WHITE","DI","1598","2013-01-01"
1360139783,1372832822,"2021-01-01","4","NT","P","146500","LU","VAUXHALL","ASTRA","BLUE","DI","1686","2006-09-29"
1232194325,152373223,"2021-01-01","4","NT","F","96459","DE","VAUXHALL","MOKKA","WHITE","DI","1686","2013-04-27"
464521577,17056716,"2021-01-01","4","NT","P","201104","B","HONDA","JAZZ","BLACK","PE","1339","2005-10-31"
848357951,888720926,"2021-01-01","4","NT","P","160067","IP","PEUGEOT","407","RED","DI","1997","2007-06-29"
80685203,471452873,"2021-01-01","4","NT","P","18017","W","MERCEDES-BENZ","B-CLASS","BLACK","DI","1796","2012-12-28"
592467035,85469405,"2021-01-01","4","NT","P","129977","E","BMW","3 SERIES","BLACK","OT","2979","2013-09-26"
720412493,216763752,"2021-01-01","4","NT","F","75954","WF","FIAT","QUBO","SILVER","DI","1248","2011-02-09"
1824794287,938472162,"2021-01-01","4","NT","F","106640","NG","VAUXHALL","VIVARO","SILVER","DI","1998","2012-06-21"
==> test_result_20220531131730_32357.csv <==
"test_id","vehicle_id","test_date","test_class_id","test_type","test_result","test_mileage","postcode_area","make","model","colour","fuel_type","cylinder_capacity","first_use_date"
1279321653,634412746,"2021-01-01","4","NT","F","343513","WF","MERCEDES-BENZ","SPRINTER","YELLOW","DI","3000","2012-04-25"
1151376195,1234673114,"2021-01-01","4","NT","P","157010","TW","AUDI","A3","SILVER","PE","1781","2001-11-23"
895485279,911643456,"2021-01-01","4","NT","P","132796","TR","FORD","FOCUS","RED","PE","1596","2009-06-11"
639594363,687919491,"2021-01-01","4","NT","P","53782","NR","LEXUS","RX","WHITE","HY","3456","2015-03-06"
767539821,889650434,"2021-01-01","4","NT","PRS","103743","TS","ISUZU","TROOPER CITATION LWB","SILVER","DI","2999","2002-09-10"
1023430737,1453366760,"2021-01-01","4","NT","P","96932","NE","FIAT","DOBLO","RED","DI","1248","2013-02-18"
383703447,51917029,"2021-01-01","7","NT","P","99190","B","MERCEDES-BENZ","SPRINTER","WHITE","DI","2143","2017-10-19"
511648905,497689026,"2021-01-01","4","NT","P","115537","BS","FORD","MONDEO","GREY","DI","1997","2010-06-11"
1871921615,177413768,"2021-01-01","4","NT","P","100202","N","NISSAN","NOTE","SILVER","PE","1386","2011-06-30"
==> test_result_20220531131730_32360.csv <==
"test_id","vehicle_id","test_date","test_class_id","test_type","test_result","test_mileage","postcode_area","make","model","colour","fuel_type","cylinder_capacity","first_use_date"
1986430547,564890282,"2021-01-01","4","NT","PRS","117640","B","VOLKSWAGEN","TOURAN","BLACK","PE","1598","2006-03-23"
1474648715,722134102,"2021-01-01","4","NT","P","83331","ST","PEUGEOT","307","SILVER","PE","1360","2006-07-24"
195194135,1199350646,"2021-01-01","4","NT","P","50394","TQ","FORD","KA","SILVER","PE","1242","2012-04-27"
1171630471,184914876,"2021-01-01","4","RT","P","203055","HD","VOLKSWAGEN","GOLF","BLACK","DI","1968","2009-06-03"
1811357761,224020108,"2021-01-01","4","NT","P","116190","M","BMW","520","BLACK","DI","1995","2007-06-15"
787794097,115643448,"2021-01-01","4","NT","F","34237","BS","FORD","KUGA","SILVER","DI","1997","2012-07-31"
1380394059,1288867532,"2021-01-01","4","NT","P","145298","CB","FORD","FOCUS","BLACK","PE","1596","2008-10-31"
1124503143,838418611,"2021-01-01","4","NT","P","55606","IG","BMW","420","WHITE","DI","1995","2015-04-28"
484775853,241631414,"2021-01-01","4","NT","F","121005","WV","VOLKSWAGEN","GOLF","BLACK","PE","1390","2005-03-06"
==> test_result_20220531131730_32361.csv <==
"test_id","vehicle_id","test_date","test_class_id","test_type","test_result","test_mileage","postcode_area","make","model","colour","fuel_type","cylinder_capacity","first_use_date"
1831612037,761864998,"2021-01-01","4","NT","P","88619","B","VOLKSWAGEN","GOLF","SILVER","DI","1896","2007-03-02"
599284785,116850262,"2021-01-01","4","NT","F","113055","W","PEUGEOT","207","BLACK","PE","1397","2009-12-02"
1063939289,1032666631,"2021-01-01","4","NT","P","50137","W","RENAULT","CLIO","GREY","PE","899","2015-03-24"
552157457,821557154,"2021-01-01","4","NT","P","140074","B","TOYOTA","COROLLA","RED","PE","1794","2007-09-11"
424211999,8574225,"2021-01-01","4","NT","P","103539","GL","TOYOTA","HILUX","SILVER","DI","2982","2015-05-14"
1993248297,857979320,"2021-01-01","4","NT","PRS","248808","NG","FORD","GALAXY","SILVER","DI","1753","2007-03-30"
74066427,732750802,"2021-01-01","4","RT","P","137384","B","AUDI","A6","GREY","DI","2967","2006-05-26"
377084671,373505197,"2021-01-01","4","NT","PRS","130120","PE","HONDA","CIVIC","BROWN","DI","1597","2013-11-29"
1690230053,517503710,"2021-01-01","4","NT","F","131363","ST","VAUXHALL","ASTRA","BLACK","DI","1910","2007-12-08"
==> test_result_20220531131730_32365.csv <==
"test_id","vehicle_id","test_date","test_class_id","test_type","test_result","test_mileage","postcode_area","make","model","colour","fuel_type","cylinder_capacity","first_use_date"
1427521387,1275325852,"2021-01-01","4","NT","PRS","154859","B","TOYOTA","YARIS","BEIGE","PE","998","1999-10-06"
996557685,1104183300,"2021-01-01","4","RT","P","189669","W","VOLVO","900 Series","BLUE","PE","2316","1997-09-17"
868612227,128460869,"2021-01-01","4","RT","P","68897","CF","FORD","FIESTA","WHITE","PE","1242","2013-09-01"
276012265,1307831708,"2021-01-01","4","NT","P","78136","S","CITROEN","XSARA","SILVER","DI","1997","2001-11-01"
1508339517,722698972,"2021-01-01","4","NT","P","139570","B","FORD","GALAXY","BLACK","DI","1999","2010-03-24"
403957723,236826272,"2021-01-01","4","NT","P","2385","BS","AUSTIN","MINI MAYFAIR","GREEN","PE","1998","1990-08-24"
1252448601,69590214,"2021-01-01","4","NT","P","161781","NG","VOLVO","S40","WHITE","DI","1997","2009-09-29"
1636284975,229487022,"2021-01-01","4","NT","PRS","112789","B","FORD","S-MAX","BLUE","DI","1997","2009-09-01"
612721311,687872748,"2021-01-01","4","NT","P","107788","HA","NISSAN","NOTE","SILVER","PE","1598","2006-11-02"
==> test_result_20220531131730_32367.csv <==
"test_id","vehicle_id","test_date","test_class_id","test_type","test_result","test_mileage","postcode_area","make","model","colour","fuel_type","cylinder_capacity","first_use_date"
1972994021,372918378,"2021-01-01","4","NT","P","192005","DE","NISSAN","NAVARA","BLUE","DI","2488","2005-09-29"
1797921235,1410950684,"2021-01-01","4","RT","P","117283","YO","NISSAN","NAVARA","BLACK","DI","2488","2009-03-20"
949430357,494602622,"2021-01-01","4","NT","F","105987","TA","FORD","MONDEO","GREY","DI","1997","2009-06-01"
1542030319,158896024,"2021-01-01","4","NT","P","88713","GU","PEUGEOT","307","BLUE","PE","1360","2007-03-28"
1158193945,405359676,"2021-01-01","4","NT","P","109011","S","KIA","SPORTAGE","SILVER","DI","1995","2012-03-31"
821484899,318101814,"2021-01-01","4","NT","PRS","87002","B","VOLKSWAGEN","POLO","SILVER","PE","1390","2002-05-20"
1959557495,225828328,"2021-01-01","4","RT","P","136529","B","NISSAN","MICRA","RED","DI","1461","2006-06-15"
262575739,587455594,"2021-01-01","4","NT","P","104812","HA","MERCEDES-BENZ","E","GREY","DI","2143","2010-09-06"
390521197,917399585,"2021-01-01","4","NT","P","56359","E","MITSUBISHI","OUTLANDER","BLACK","HY","1998","2015-06-23"
==> test_result_20220531131730_32370.csv <==
"test_id","vehicle_id","test_date","test_class_id","test_type","test_result","test_mileage","postcode_area","make","model","colour","fuel_type","cylinder_capacity","first_use_date"
1447775663,762736456,"2021-01-01","4","RT","P","188466","TR","AUDI","A6","SILVER","DI","2698","2006-04-19"
855175701,108434218,"2021-01-01","4","NT","PRS","69061","B","RENAULT","CLIO","SILVER","PE","1390","2007-12-21"
1912430167,958268864,"2021-01-01","4","NT","PRS","119307","B","AUDI","A3","WHITE","DI","1968","2009-07-16"
1656539251,93119174,"2021-01-01","4","NT","P","91283","B","TOYOTA","COROLLA","BLACK","PE","1398","2006-10-31"
457902801,843004236,"2021-01-01","4","RT","P","235305","RG","FORD","TRANSIT","WHITE","DI","2198","2012-02-17"
713793717,1379084492,"2021-01-01","4","NT","P","134851","B","BMW","X5","GREY","DI","2993","2011-07-05"
1865302839,418842510,"2021-01-01","4","NT","P","160027","IG","VOLKSWAGEN","PASSAT","GREY","DI","1968","2009-07-15"
1818175511,168033852,"2021-01-01","7","NT","P","426591","B","MERCEDES-BENZ","SPRINTER","WHITE","DI","2148","2005-11-04"
1643102725,973895810,"2021-01-01","4","NT","F","29503","BS","RENAULT","CLIO","BLUE","PE","1149","1999-11-29"
==> test_result_20220531131730_32372.csv <==
"test_id","vehicle_id","test_date","test_class_id","test_type","test_result","test_mileage","postcode_area","make","model","colour","fuel_type","cylinder_capacity","first_use_date"
1205321273,1204453892,"2021-01-01","4","NT","F","81916","OX","VAUXHALL","CORSA","BLACK","PE","1229","2011-01-27"
1286139403,314898326,"2021-01-01","4","NT","P","52932","NG","FORD","FOCUS","RED","PE","1596","2011-03-19"
774357571,1338803624,"2021-01-01","4","NT","P","116405","CT","VAUXHALL","CORSA","SILVER","PE","998","2010-11-26"
1494902991,14313660,"2021-01-01","4","NT","P","152177","OL","VOLKSWAGEN","GOLF","BLACK","DI","1896","2003-04-25"
181757609,479742089,"2021-01-01","4","NT","P","17385","DN","MERCEDES-BENZ","C","SILVER","PE","1991","2016-07-21"
1575721121,1395768361,"2021-01-01","4","NT","P","55299","B","MERCEDES-BENZ","CITAN","WHITE","DI","1461","2016-11-28"
935993831,443613806,"2021-01-01","4","NT","P","59680","LN","MAZDA","2","SILVER","PE","1349","2008-03-29"
1319830205,845986202,"2021-01-01","4","NT","P","177288","SA","RENAULT","TRAFIC","BLACK","DI","1870","2003-03-26"
1400648335,1454890471,"2021-01-01","4","NT","P","61757","OX","VOLKSWAGEN","TRANSPORTER","GREY","DI","1968","2016-03-02"
==> test_result_20220531131730_32375.csv <==
"test_id","vehicle_id","test_date","test_class_id","test_type","test_result","test_mileage","postcode_area","make","model","colour","fuel_type","cylinder_capacity","first_use_date"
1353521007,1428275549,"2021-01-01","4","NT","P","25550","B","NISSAN","NOTE","SILVER","PE","1198","2016-11-14"
1272702877,808958468,"2021-01-01","4","NT","F","132516","BB","CITROEN","C4","BLACK","DI","1997","2008-04-22"
410775473,1336517888,"2021-01-01","4","NT","P","84448","TW","BMW","116","BLUE","PE","1596","2010-03-17"
1979811771,1290599648,"2021-01-01","4","RT","P","238050","BS","PORSCHE","CAYENNE","BLACK","PE","3189","2005-06-16"
1468029939,608708764,"2021-01-01","4","RT","P","135797","IG","FIAT","PUNTO","BLACK","PE","1242","2006-06-27"
316520817,1277248145,"2021-01-02","4","NT","P","57968","TN","SKODA","FABIA","RED","PE","1197","2015-07-16"
60629901,764751148,"2021-01-02","4","NT","P","79761","PE","PEUGEOT","BIPPER","RED","DI","1397","2010-02-08"
1292957153,964933515,"2021-01-02","4","NT","P","29808","LU","VOLVO","V40","BLACK","DI","1969","2016-01-29"
1629666199,141895556,"2021-01-02","4","NT","P","157456","SS","TOYOTA","RAV4","BLUE","PE","1998","1997-01-31"
==> test_result_20220531131730_32378.csv <==
"test_id","vehicle_id","test_date","test_class_id","test_type","test_result","test_mileage","postcode_area","make","model","colour","fuel_type","cylinder_capacity","first_use_date"
808048373,1167625608,"2021-01-01","4","NT","PRS","130084","CV","FORD","FOCUS","GREY","DI","1560","2006-01-18"
1737357381,248086777,"2021-01-01","4","NT","P","118472","B","VOLKSWAGEN","GOLF","SILVER","DI","1598","2012-10-30"
505030129,1270356948,"2021-01-01","4","NT","P","100569","WF","MERCEDES-BENZ","C","WHITE","DI","2987","2010-03-31"
585848259,112244534,"2021-01-01","4","NT","P","51209","S","SEAT","IBIZA","BLACK","PE","1390","2007-06-21"
26939099,880738429,"2021-01-01","4","NT","P","60897","BD","TOYOTA","ESTIMA","WHITE","HY","2362","2019-07-01"
875429977,1053590438,"2021-01-01","4","NT","PRS","225196","B","HONDA","CIVIC","BLUE","DI","2204","2007-01-13"
1259266351,775389341,"2021-01-01","4","NT","P","106839","PR","VAUXHALL","ASTRA","BLACK","DI","1686","2013-12-31"
1515157267,38626000,"2021-01-01","4","NT","P","135200","BB","SEAT","IBIZA","BLACK","DI","1598","2010-05-18"
363648145,599995984,"2021-01-01","4","NT","P","130800","B","SUBARU","IMPREZA","BLUE","PE","1994","2001-09-01"
==> test_result_20220531131730_32384.csv <==
"test_id","vehicle_id","test_date","test_class_id","test_type","test_result","test_mileage","postcode_area","make","model","colour","fuel_type","cylinder_capacity","first_use_date"
1373775283,1152701668,"2021-01-02","4","NT","PRS","111961","IG","MERCEDES-BENZ","S-Class","BLACK","PE","3199","2001-07-27"
94320703,590801779,"2021-01-02","4","NT","P","22838","SE","BMW","3 SERIES","BLUE","DI","1995","2015-12-30"
1724119829,448185815,"2021-01-02","4","RT","P","157346","HU","MERCEDES-BENZ","VITO","BLUE","DI","1598","2015-12-31"
1952938719,459450004,"2021-01-02","4","NT","P","125054","ME","PEUGEOT","407","SILVER","DI","1997","2008-09-30"
1232592273,361694814,"2021-01-02","4","RT","P","130921","SS","VOLKSWAGEN","GOLF","SILVER","PE","1197","2010-09-16"
1057320513,1097794689,"2021-01-02","4","NT","P","139127","TW","FORD","TRANSIT","WHITE","DI","2198","2014-09-30"
1091210289,352834300,"2021-01-02","4","RT","P","42119","DN","FORD","FOCUS","GREEN","PE","2522","2010-09-04"
357029369,271730714,"2021-01-02","4","NT","F","88105","RH","SKODA","FABIA","GREY","DI","1598","2012-03-29"
498411353,45784180,"2021-01-02","4","NT","F","78730","DN","LAND ROVER","RANGE ROVER","GREY","DI","3630","2006-11-30"
==> test_result_20220531131730_32386.csv <==
"test_id","vehicle_id","test_date","test_class_id","test_type","test_result","test_mileage","postcode_area","make","model","colour","fuel_type","cylinder_capacity","first_use_date"
1676793527,856620044,"2021-01-02","4","NT","P","90293","CF","AUDI","A4","BLACK","DI","1968","2008-04-22"
1037066237,325745083,"2021-01-02","4","NT","P","118982","SL","VOLKSWAGEN","GOLF","BLACK","DI","1968","2014-07-01"
525284405,877659358,"2021-01-02","4","NT","P","94618","SG","MAZDA","5","GREY","PE","1998","2009-09-15"
175138833,921349839,"2021-01-02","4","NT","P","19887","BN","PEUGEOT","208","GREY","PE","1200","2015-06-30"
1872120589,348180150,"2021-01-02","4","NT","P","135914","PE","AUDI","A3","BLACK","PE","1390","2008-07-18"
431029749,1367522568,"2021-01-02","4","NT","P","120012","UB","BMW","3 SERIES","BLUE","PE","1995","2008-12-17"
612920285,124080203,"2021-01-02","4","NT","P","79667","UB","LAND ROVER","RANGE ROVER","GOLD","DI","2993","2014-12-19"
787993071,15120336,"2021-01-02","4","NT","P","123628","SM","AUDI","A6","SILVER","DI","1968","2011-06-15"
451284025,72068234,"2021-01-02","4","NT","P","153516","SW","FORD","FIESTA","GREEN","PE","1242","2007-05-18"
The contents of the files look correct for 2021, note that the content is quoted for some columns. Will need to take this into account when importing. The total file size of these files is 4.2GB. Considerably more than 2022 & 2023 but the quoted strings would account for this.
Checking number of rows…
simon@NUC:~/Documents/mot_data/test_result_2021$ wc -l *
3362529 test_result_2021_32355.csv
3362450 test_result_2021_32357.csv
3368008 test_result_2021_32360.csv
3363319 test_result_2021_32361.csv
3366233 test_result_2021_32365.csv
3367302 test_result_2021_32367.csv
3363390 test_result_2021_32370.csv
3362481 test_result_2021_32372.csv
3363314 test_result_2021_32375.csv
3369258 test_result_2021_32378.csv
3367463 test_result_2021_32384.csv
3364911 test_result_2021_32386.csv
40380658 total
This is a similar number to 2023/2024 which ties up with the file size increase only being related to the quoting of strings.
A quick rename will keep the files and directory consistent with the other years…
simon@NUC:~/Documents/mot_data/test_result_2022$ rename 's/20220531131730/2021/' *
simon@NUC:~/Documents/mot_data/test_result_2022$ ls -l
total 4346148
-rw-r--r-- 1 simon simon 370591916 May 31 2022 test_result_2021_32355.csv
-rw-r--r-- 1 simon simon 370584177 May 31 2022 test_result_2021_32357.csv
-rw-r--r-- 1 simon simon 371195101 May 31 2022 test_result_2021_32360.csv
-rw-r--r-- 1 simon simon 370681200 May 31 2022 test_result_2021_32361.csv
-rw-r--r-- 1 simon simon 371006342 May 31 2022 test_result_2021_32365.csv
-rw-r--r-- 1 simon simon 371116837 May 31 2022 test_result_2021_32367.csv
-rw-r--r-- 1 simon simon 370669369 May 31 2022 test_result_2021_32370.csv
-rw-r--r-- 1 simon simon 370581748 May 31 2022 test_result_2021_32372.csv
-rw-r--r-- 1 simon simon 370670092 May 31 2022 test_result_2021_32375.csv
-rw-r--r-- 1 simon simon 371339397 May 31 2022 test_result_2021_32378.csv
-rw-r--r-- 1 simon simon 371123609 May 31 2022 test_result_2021_32384.csv
-rw-r--r-- 1 simon simon 370865904 May 31 2022 test_result_2021_32386.csv
simon@NUC:~/Documents/mot_data/test_result_2022$ cd ..
simon@NUC:~/Documents/mot_data$ mv test_result_2022 test_result_2021
2020 #
simon@NUC:~/Documents/mot_data$ unzip dft_test_result_2020.zip
Archive: dft_test_result_2020.zip
inflating: dft_test_result-from-2020-01-01_00-00-00-to-2020-04-01_00-00-00.csv
inflating: dft_test_result-from-2020-04-01_00-00-00-to-2020-07-01_00-00-00.csv
inflating: dft_test_result-from-2020-07-01_00-00-00-to-2020-10-01_00-00-00.csv
inflating: dft_test_result-from-2020-10-01_00-00-00-to-2021-01-01_00-00-00.csv
simon@NUC:~/Documents/mot_data$ ls -lh dft_test_result-from-2020-*
-rw-r--r-- 1 simon simon 830M Mar 18 2021 dft_test_result-from-2020-01-01_00-00-00-to-2020-04-01_00-00-00.csv
-rw-r--r-- 1 simon simon 439M Mar 18 2021 dft_test_result-from-2020-04-01_00-00-00-to-2020-07-01_00-00-00.csv
-rw-r--r-- 1 simon simon 914M Mar 18 2021 dft_test_result-from-2020-07-01_00-00-00-to-2020-10-01_00-00-00.csv
-rw-r--r-- 1 simon simon 986M Mar 18 2021 dft_test_result-from-2020-10-01_00-00-00-to-2021-01-01_00-00-00.csv
OK, different file naming again, total of 3.1GB so in line with 2022/2023.
simon@NUC:~/Documents/mot_data$ wc -l dft_test_result-from-2020-*
10104426 dft_test_result-from-2020-01-01_00-00-00-to-2020-04-01_00-00-00.csv
5362646 dft_test_result-from-2020-04-01_00-00-00-to-2020-07-01_00-00-00.csv
11137837 dft_test_result-from-2020-07-01_00-00-00-to-2020-10-01_00-00-00.csv
11989108 dft_test_result-from-2020-10-01_00-00-00-to-2021-01-01_00-00-00.csv
38594017 total
simon@NUC:~/Documents/mot_data$ head dft_test_result-from-2020-*
==> dft_test_result-from-2020-01-01_00-00-00-to-2020-04-01_00-00-00.csv <==
test_id,vehicle_id,test_date,test_class_id,test_type,test_result,test_mileage,postcode_area,make,model,colour,fuel_type,cylinder_capacity,first_use_date
666422869,1253657552,2020-01-01,4,NT,P,63975,TR,CITROEN,DISPATCH,WHITE,DI,1560,2011-03-14
623774383,51021182,2020-01-01,4,NT,P,107361,NN,SEAT,IBIZA,YELLOW,PE,1390,2008-12-18
581125897,612989654,2020-01-01,4,NT,P,73160,NN,MERCEDES,A 150,SILVER,PE,1498,2007-09-28
538477411,458058688,2020-01-01,4,NT,P,,TR,CITROEN,DISPATCH,WHITE,DI,1868,2004-11-19
325234981,1422080365,2020-01-01,1,NT,F,27120,SS,KTM,125,ORANGE,PE,125,2013-12-07
367883467,1254023710,2020-01-01,4,NT,P,81260,RM,FORD,FOCUS,BLUE,PE,1596,2005-06-13
453180439,1266564042,2020-01-01,4,NT,P,93426,CA,SKODA,FABIA,GREY,DI,1598,2012-06-21
282586495,341436608,2020-01-01,4,NT,P,127237,B,VAUXHALL,INSIGNIA,WHITE,DI,1956,2010-09-10
154641037,504524381,2020-01-01,4,NT,P,109759,B,SKODA,RAPID,RED,DI,1598,2014-12-19
==> dft_test_result-from-2020-04-01_00-00-00-to-2020-07-01_00-00-00.csv <==
test_id,vehicle_id,test_date,test_class_id,test_type,test_result,test_mileage,postcode_area,make,model,colour,fuel_type,cylinder_capacity,first_use_date
677835507,1044704117,2020-04-01,4,RT,P,50331,M,PEUGEOT,EXPERT,RED,DI,1560,2015-10-31
763132479,1217941099,2020-04-01,7,NT,P,156078,WA,MERCEDES-BENZ,SPRINTER,WHITE,DI,2143,2014-03-31
635187021,503571165,2020-04-01,7,NT,P,104440,BD,MERCEDES-BENZ,SPRINTER,WHITE,DI,2143,2016-04-20
592538535,1399375571,2020-04-01,4,NT,P,34837,IP,VOLKSWAGEN,POLO,WHITE,PE,999,2016-04-30
549890049,1168611002,2020-04-01,4,NT,P,76149,CV,RENAULT,MEGANE,WHITE,DI,1461,2011-09-14
464593077,221689772,2020-04-01,4,NT,P,68683,S,FORD,FIESTA,WHITE,PE,1242,2012-03-20
421944591,384337924,2020-04-01,4,NT,P,120364,CV,NISSAN,NAVARA,SILVER,DI,2488,2005-11-30
507241563,1070427905,2020-04-01,4,NT,P,123609,WF,MERCEDES-BENZ,SPRINTER,YELLOW,DI,2987,2015-06-09
379296105,1218623039,2020-04-01,7,RT,P,58995,HU,MERCEDES-BENZ,SPRINTER,SILVER,DI,2143,2015-04-20
==> dft_test_result-from-2020-07-01_00-00-00-to-2020-10-01_00-00-00.csv <==
test_id,vehicle_id,test_date,test_class_id,test_type,test_result,test_mileage,postcode_area,make,model,colour,fuel_type,cylinder_capacity,first_use_date
534152707,749768485,2020-07-01,4,NT,P,44038,DN,RENAULT,KADJAR,BLACK,DI,1598,2015-08-28
619449679,1316673618,2020-07-01,4,NT,P,46052,OL,MINI,MINI (R58),RED,PE,1598,2012-08-10
662098165,1087081109,2020-07-01,4,NT,P,25513,NE,MERCEDES-BENZ,GLA,GREY,DI,2143,2017-02-16
235613305,1150300382,2020-07-01,4,RT,P,56624,B,VAUXHALL,COMBO,RED,DI,1248,2011-06-15
576801193,675795911,2020-07-01,4,NT,P,12352,LU,SKODA,CITIGO,WHITE,PE,999,2016-10-31
320910277,1383430983,2020-07-01,4,NT,F,67938,DN,AUDI,A4,BLACK,DI,1968,2016-11-16
448855735,187152929,2020-07-01,4,NT,P,86544,LU,MITSUBISHI,OUTLANDER,BLACK,HY,1998,2015-11-02
491504221,378989977,2020-07-01,7,NT,F,48171,HU,MERCEDES-BENZ,VITO,WHITE,DI,2143,2013-09-26
363558763,1451451149,2020-07-01,4,NT,P,20213,LE,FORD,MUSTANG,BLUE,PE,4951,2017-08-26
==> dft_test_result-from-2020-10-01_00-00-00-to-2021-01-01_00-00-00.csv <==
test_id,vehicle_id,test_date,test_class_id,test_type,test_result,test_mileage,postcode_area,make,model,colour,fuel_type,cylinder_capacity,first_use_date
840415303,1367588451,2020-10-01,4,NT,P,36788,LE,FORD,TRANSIT,SILVER,DI,2198,2014-10-03
797766817,146223609,2020-10-01,4,NT,P,21856,LU,NISSAN,JUKE,BLUE,PE,1618,2017-02-03
968360761,144911119,2020-10-01,4,NT,P,45041,LE,MINI,PACEMAN,BLACK,PE,1598,2015-03-26
755118331,740243933,2020-10-01,4,NT,P,30389,LU,FORD,FOCUS,BLACK,PE,1999,2017-11-22
1224251677,1368373019,2020-10-01,4,NT,P,38749,LU,LAND ROVER,RANGE ROVER EVOQUE,BLACK,DI,1999,2016-09-27
243336499,359109671,2020-10-01,4,RT,P,45397,LU,PEUGEOT,2008,WHITE,DI,1398,2014-06-30
158039527,163738397,2020-10-01,4,RT,P,93462,LU,MERCEDES-BENZ,C,GREY,DI,2143,2015-05-05
456578929,703774257,2020-10-01,4,NT,P,33700,HD,AUDI,A6,BLACK,DI,1968,2017-06-30
584524387,1498372684,2020-10-01,4,NT,PRS,173053,E,VAUXHALL,CORSAVAN,BLUE,DI,1686,2001-12-14
Total of 38.5 million tests and heads of each file look good but again a different format with comma as the delimiter.
2019 #
simon@NUC:~/Documents/mot_data$ unzip dft_test_result_2019.zip
Archive: dft_test_result_2019.zip
inflating: dft_test_result-from-2019-04-01_00-00-01-to-2019-07-01_00-00-01.csv
creating: __MACOSX/
inflating: __MACOSX/._dft_test_result-from-2019-04-01_00-00-01-to-2019-07-01_00-00-01.csv
inflating: dft_test_result-from-2019-10-01_00-00-01-to-2020-01-01_00-00-01.csv
inflating: __MACOSX/._dft_test_result-from-2019-10-01_00-00-01-to-2020-01-01_00-00-01.csv
inflating: dft_test_result-from-2019-07-01_00-00-01-to-2019-10-01_00-00-01.csv
inflating: __MACOSX/._dft_test_result-from-2019-07-01_00-00-01-to-2019-10-01_00-00-01.csv
inflating: dft_test_result-from-2019-01-01_00-00-01-to-2019-04-01_00-00-01.csv
inflating: __MACOSX/._dft_test_result-from-2019-01-01_00-00-01-to-2019-04-01_00-00-01.csv
Obviously compiled on a Mac this year. Let’s remove the metadata folder
simon@NUC:~/Documents/mot_data$ rm -rf __MACOSX/
And initial data checks…
simon@NUC:~/Documents/mot_data$ ls -lh dft_test_result-from-2019-*
-rw-r--r-- 1 simon simon 842M May 11 2020 dft_test_result-from-2019-01-01_00-00-01-to-2019-04-01_00-00-01.csv
-rw-r--r-- 1 simon simon 863M May 11 2020 dft_test_result-from-2019-04-01_00-00-01-to-2019-07-01_00-00-01.csv
-rw-r--r-- 1 simon simon 840M May 11 2020 dft_test_result-from-2019-07-01_00-00-01-to-2019-10-01_00-00-01.csv
-rw-r--r-- 1 simon simon 696M May 11 2020 dft_test_result-from-2019-10-01_00-00-01-to-2020-01-01_00-00-01.csv
simon@NUC:~/Documents/mot_data$ wc -l dft_test_result-from-2019-*
10210087 dft_test_result-from-2019-01-01_00-00-01-to-2019-04-01_00-00-01.csv
10466892 dft_test_result-from-2019-04-01_00-00-01-to-2019-07-01_00-00-01.csv
10194305 dft_test_result-from-2019-07-01_00-00-01-to-2019-10-01_00-00-01.csv
8439418 dft_test_result-from-2019-10-01_00-00-01-to-2020-01-01_00-00-01.csv
39310702 total
simon@NUC:~/Documents/mot_data$ head dft_test_result-from-2019-*
==> dft_test_result-from-2019-01-01_00-00-01-to-2019-04-01_00-00-01.csv <==
test_id,vehicle_id,test_date,test_class_id,test_type,test_result,test_mileage,postcode_area,make,model,colour,fuel_type,cylinder_capacity,first_use_date
1930167913,1168220651,2019-01-01,4,NT,P,47108,LL,LAND ROVER,DISCOVERY,WHITE,DI,2993,2014-07-29
1887519427,608494756,2019-01-01,4,NT,P,74254,RM,VAUXHALL,COMBO,BLUE,DI,1686,2000-10-16
1844870941,345838224,2019-01-01,4,NT,P,52596,RM,SMART (MCC),FORTWO COUPE,BLACK,PE,999,2010-06-30
1802222455,712515370,2019-01-01,4,NT,F,97925,S,KIA,CEED,BLUE,DI,1582,2007-10-31
1631628511,929718858,2019-01-01,4,RT,P,91055,BB,TOYOTA,YARIS,RED,PE,998,2002-11-11
1588980025,228077478,2019-01-01,4,RT,P,69520,BN,HYUNDAI,COUPE,SILVER,PE,1975,2006-06-27
1205143651,614637102,2019-01-01,4,RT,P,62554,CA,NISSAN,JUKE,RED,PE,1598,2011-03-09
1759573969,618829162,2019-01-01,4,NT,P,56880,TW,PEUGEOT,308 S AUTO,GREY,PE,1598,2008-11-19
1674276997,682893232,2019-01-01,4,NT,F,80949,S,ALFA ROMEO,MITO,BLACK,PE,1368,2009-05-25
==> dft_test_result-from-2019-04-01_00-00-01-to-2019-07-01_00-00-01.csv <==
test_id,vehicle_id,test_date,test_class_id,test_type,test_result,test_mileage,postcode_area,make,model,colour,fuel_type,cylinder_capacity,first_use_date
949308439,831850247,2019-04-01,4,NT,P,11495,LU,FORD,FIESTA,WHITE,PE,998,2016-06-06
864011467,1337370923,2019-04-01,7,NT,P,144959,S,MERCEDES-BENZ,SPRINTER,SILVER,DI,2143,2015-06-01
906659953,134104785,2019-04-01,4,NT,F,13234,LU,RENAULT,CAPTUR,CREAM,DI,1461,2014-05-09
821362981,697598213,2019-04-01,4,NT,P,47127,S,KIA,RIO,BLUE,DI,1396,2013-03-04
778714495,1033590608,2019-04-01,4,NT,P,46895,LU,VAUXHALL,CORSA,GREY,PE,1229,2012-03-12
736066009,928095205,2019-04-01,4,NT,P,58118,DA,HONDA,CIVIC,BLUE,DI,1597,2016-04-13
693417523,15334170,2019-04-01,4,NT,F,57765,DA,HYUNDAI,AMICA,SILVER,PE,1086,2007-04-30
650769037,1285097458,2019-04-01,4,NT,P,114576,SO,TOYOTA,COROLLA,SILVER,PE,1598,2002-04-12
53690233,211955150,2019-04-01,4,NT,ABR,"",DN,FORD,KA,BLUE,PE,1299,2003-03-31
==> dft_test_result-from-2019-07-01_00-00-01-to-2019-10-01_00-00-01.csv <==
test_id,vehicle_id,test_date,test_class_id,test_type,test_result,test_mileage,postcode_area,make,model,colour,fuel_type,cylinder_capacity,first_use_date
82833359,970855005,2019-07-01,4,NT,P,10702,LU,KIA,PICANTO,YELLOW,PE,998,2015-09-22
1997536387,711499399,2019-07-01,4,NT,P,63103,LU,VAUXHALL,INSIGNIA,BLACK,DI,1956,2014-10-24
1954887901,711499399,2019-07-01,4,NT,ABR,"",LU,VAUXHALL,INSIGNIA,BLACK,DI,1956,2014-10-24
1912239415,384548675,2019-07-01,4,RT,P,18253,LU,DS,DS3,WHITE,DI,1560,2016-07-09
1869590929,96388074,2019-07-01,4,NT,P,77656,CV,CITROEN,XSARA,SILVER,PE,1587,2003-06-11
1826942443,595424953,2019-07-01,4,NT,P,60312,CV,NISSAN,JUKE,WHITE,DI,1461,2012-11-01
1741645471,1368159163,2019-07-01,7,NT,P,102797,BS,RENAULT,MASTER,BLACK,DI,2299,2014-04-25
1698996985,997585230,2019-07-01,4,NT,P,59156,LU,NISSAN,QASHQAI,BLACK,PE,1598,2011-07-06
1656348499,1431210186,2019-07-01,4,NT,P,195441,SS,FORD,MONDEO,SILVER,DI,1997,2009-06-02
==> dft_test_result-from-2019-10-01_00-00-01-to-2020-01-01_00-00-01.csv <==
test_id,vehicle_id,test_date,test_class_id,test_type,test_result,test_mileage,postcode_area,make,model,colour,fuel_type,cylinder_capacity,first_use_date
1382999721,1185783101,2019-10-01,4,NT,P,30624,G,AUDI,A6,WHITE,DI,1968,2016-09-30
1212405777,1032453651,2019-10-01,4,NT,P,37795,ME,BMW,520,GREY,DI,1995,2015-04-29
1297702749,1339540147,2019-10-01,7,NT,P,35083,LE,FORD,RANGER,SILVER,DI,3198,2016-09-27
1127108805,300811285,2019-10-01,4,NT,P,36582,NG,BMW,420,WHITE,DI,1995,2017-02-02
1255054263,1493297193,2019-10-01,7,NT,P,15558,SW,FORD,TRANSIT,WHITE,DI,1995,2016-11-30
1169757291,147054239,2019-10-01,4,NT,PRS,29499,OL,VAUXHALL,ASTRA,BLACK,PE,999,2016-09-01
1084460319,454568331,2019-10-01,4,NT,P,37519,ME,BMW,118,GREY,DI,1995,2017-01-20
1041811833,270177294,2019-10-01,4,NT,P,163104,SK,LAND ROVER,FREELANDER,SILVER,DI,1951,2003-11-27
828569403,464178463,2019-10-01,4,NT,F,52844,ME,CITROEN,C4,WHITE,DI,1560,2013-12-03
Around 3.1GB of files again and 39 million rows and the heads of files all look good and again comma as delimter.
2018 #
simon@NUC:~/Documents/mot_data$ unzip dft_test_result_2018.zip
Archive: dft_test_result_2018.zip
inflating: dft_test_result-from-2018-01-01_00-00-01-to-2018-04-01_00-00-01.csv
creating: __MACOSX/
inflating: __MACOSX/._dft_test_result-from-2018-01-01_00-00-01-to-2018-04-01_00-00-01.csv
inflating: dft_test_result-from-2018-10-01_00-00-01-to-2019-01-01_00-00-01.csv
inflating: __MACOSX/._dft_test_result-from-2018-10-01_00-00-01-to-2019-01-01_00-00-01.csv
inflating: dft_test_result-from-2018-07-01_00-00-01-to-2018-10-01_00-00-01.csv
inflating: __MACOSX/._dft_test_result-from-2018-07-01_00-00-01-to-2018-10-01_00-00-01.csv
inflating: dft_test_result-from-2018-04-01_00-00-01-to-2018-07-01_00-00-01.csv
inflating: __MACOSX/._dft_test_result-from-2018-04-01_00-00-01-to-2018-07-01_00-00-01.csv
simon@NUC:~/Documents/mot_data$ rm -rf __MACOSX/*
simon@NUC:~/Documents/mot_data$ ls -lh dft_test_result-from-2018-*
-rw-r--r-- 1 simon simon 821M May 11 2020 dft_test_result-from-2018-01-01_00-00-01-to-2018-04-01_00-00-01.csv
-rw-r--r-- 1 simon simon 863M May 11 2020 dft_test_result-from-2018-04-01_00-00-01-to-2018-07-01_00-00-01.csv
-rw-r--r-- 1 simon simon 818M May 11 2020 dft_test_result-from-2018-07-01_00-00-01-to-2018-10-01_00-00-01.csv
-rw-r--r-- 1 simon simon 690M May 11 2020 dft_test_result-from-2018-10-01_00-00-01-to-2019-01-01_00-00-01.csv
simon@NUC:~/Documents/mot_data$ wc -l dft_test_result-from-2018-*
9950816 dft_test_result-from-2018-01-01_00-00-01-to-2018-04-01_00-00-01.csv
10457352 dft_test_result-from-2018-04-01_00-00-01-to-2018-07-01_00-00-01.csv
9916657 dft_test_result-from-2018-07-01_00-00-01-to-2018-10-01_00-00-01.csv
8356980 dft_test_result-from-2018-10-01_00-00-01-to-2019-01-01_00-00-01.csv
38681805 total
simon@NUC:~/Documents/mot_data$ head dft_test_result-from-2018-*
==> dft_test_result-from-2018-01-01_00-00-01-to-2018-04-01_00-00-01.csv <==
test_id,vehicle_id,test_date,test_class_id,test_type,test_result,test_mileage,postcode_area,make,model,colour,fuel_type,cylinder_capacity,first_use_date
65820879,600511556,2018-01-01,4,NT,P,224076,RH,VAUXHALL,ZAFIRA,GREY,DI,1686,2012-05-31
782304899,388943826,2018-01-01,4,RT,P,111810,WR,VOLKSWAGEN,POLO,SILVER,PE,1198,2005-07-29
824953385,658134358,2018-01-01,4,NT,P,94665,DE,LAND ROVER,UNCLASSIFIED,PINK,DI,3528,1987-02-12
739656413,1034211704,2018-01-01,4,NT,P,66741,LU,HYUNDAI,I20,BLACK,PE,1396,2010-06-30
227874581,1068455764,2018-01-01,4,RT,P,164211,SN,VOLVO,V40,BLUE,DI,1870,2002-09-30
697007927,789941650,2018-01-01,4,NT,P,126213,B,HONDA,CR-V,RED,PE,1998,2002-10-04
185226095,864947656,2018-01-01,4,RT,P,103961,GL,PEUGEOT,206,GREEN,PE,1360,2001-01-16
398468525,768388704,2018-01-01,4,NT,P,62835,LU,HONDA,CIVIC,SILVER,EL,1339,2006-01-03
99929123,1410889022,2018-01-01,4,NT,PRS,128327,SS,AUDI,A4,BLUE,DI,1986,2005-09-28
==> dft_test_result-from-2018-04-01_00-00-01-to-2018-07-01_00-00-01.csv <==
test_id,vehicle_id,test_date,test_class_id,test_type,test_result,test_mileage,postcode_area,make,model,colour,fuel_type,cylinder_capacity,first_use_date
1527211683,1093474621,2018-04-01,4,NT,P,77010,LU,VAUXHALL,ASTRA,BLACK,DI,1248,2013-11-29
1441914711,1186082240,2018-04-01,4,NT,P,45033,LU,BMW,3 SERIES,WHITE,DI,1995,2012-05-09
1356617739,161669651,2018-04-01,4,NT,P,33131,LU,FORD,FOCUS,WHITE,DI,1560,2014-03-06
1399266225,588925693,2018-04-01,4,NT,P,19970,LU,SEAT,IBIZA,BLACK,PE,1197,2015-06-02
1313969253,576684283,2018-04-01,4,NT,P,18202,LU,VAUXHALL,CORSA,SILVER,DI,1248,2013-06-10
1271320767,258562511,2018-04-01,4,NT,P,67148,LU,VOLKSWAGEN,PASSAT,SILVER,DI,1968,2014-11-14
1228672281,742682739,2018-04-01,4,NT,P,22557,LU,PEUGEOT,108,BLUE,PE,998,2015-03-25
1143375309,582289043,2018-04-01,4,NT,P,40669,LU,MERCEDES-BENZ,C,BLACK,DI,2143,2014-03-27
1186023795,1142811899,2018-04-01,4,NT,F,46206,LU,BMW,520,BLACK,DI,1995,2013-08-21
==> dft_test_result-from-2018-07-01_00-00-01-to-2018-10-01_00-00-01.csv <==
test_id,vehicle_id,test_date,test_class_id,test_type,test_result,test_mileage,postcode_area,make,model,colour,fuel_type,cylinder_capacity,first_use_date
1704302935,692053805,2018-07-01,4,NT,P,91125,LU,MERCEDES-BENZ,A,BLUE,DI,2143,2014-07-17
1661654449,48596130,2018-07-01,4,NT,P,42598,EN,FORD,FIESTA,BLACK,PE,1388,2004-07-07
847004567,542554216,2018-07-01,4,NT,P,70144,SR,VOLKSWAGEN,GOLF,GREY,DI,1968,2010-11-29
1491060505,24796546,2018-07-01,4,NT,P,89108,EN,VAUXHALL,CORSA,SILVER,PE,1389,2005-09-30
1576357477,430844402,2018-07-01,4,NT,F,66139,LU,MINI,MINI (R60),WHITE,PE,1598,2012-09-01
1448412019,1329137730,2018-07-01,4,NT,P,24040,G,PEUGEOT,PARTNER,RED,DI,1560,2012-09-20
1533708991,352542429,2018-07-01,4,NT,P,90072,LU,BMW,316,GREY,DI,1995,2014-02-04
1363115047,227397037,2018-07-01,4,NT,P,89884,LU,MERCEDES-BENZ,A,GREY,DI,2143,2015-07-31
1320466561,61351339,2018-07-01,4,RT,P,59238,LU,VOLVO,V40,GREY,DI,1560,2014-07-31
==> dft_test_result-from-2018-10-01_00-00-01-to-2019-01-01_00-00-01.csv <==
test_id,vehicle_id,test_date,test_class_id,test_type,test_result,test_mileage,postcode_area,make,model,colour,fuel_type,cylinder_capacity,first_use_date
21790331,852883190,2018-10-01,4,NT,P,73487,EH,CITROEN,RELAY,WHITE,DI,2198,2008-05-30
1979141845,618264835,2018-10-01,4,NT,P,39881,LU,JAGUAR,XF,WHITE,DI,2993,2014-09-30
1936493359,466633390,2018-10-01,4,NT,P,49708,EH,PEUGEOT,BIPPER,RED,DI,1397,2010-03-25
1893844873,381209476,2018-10-01,4,NT,P,107483,CV,MITSUBISHI,ASX,BLACK,DI,1798,2010-11-11
1851196387,70846877,2018-10-01,4,NT,F,42385,LU,CITROEN,C3,BLACK,DI,1560,2016-03-14
1765899415,407497924,2018-10-01,4,NT,PRS,191984,CV,LAND ROVER,DISCOVERY,BLUE,DI,2495,1994-12-31
1595305471,1228557574,2018-10-01,4,NT,PRS,47499,RG,CITROEN,BERLINGO,SILVER,DI,1560,2009-06-08
642757577,611683723,2018-10-01,4,NT,F,19483,AB,FORD,FOCUS,SILVER,PE,998,2014-10-31
912929695,399212205,2018-10-01,4,RT,P,27136,LU,CITROEN,C4,GREY,DI,1560,2016-09-30
3.1GB of files with total rows of 38.6 million and the heads of files look valid with comma as the delimiter again.
2017 #
simon@NUC:~/Documents/mot_data$ unzip dft_test_result_2017.zip
Archive: dft_test_result_2017.zip
inflating: test_result_31870.csv
inflating: test_result_31871.csv
inflating: test_result_31876.csv
inflating: test_result_31879.csv
inflating: test_result_31859.csv
inflating: test_result_31860.csv
inflating: test_result_31861.csv
inflating: test_result_31862.csv
inflating: test_result_31863.csv
inflating: test_result_31864.csv
inflating: test_result_31868.csv
inflating: test_result_31869.csv
Another naming format for 2017. A quick rename to keep it consistent so we know which files relate to which year in case of any issues with importing…
simon@NUC:~/Documents/mot_data$ rename 's/t_3/t_2017_/' *
simon@NUC:~/Documents/mot_data$ ls -lh test_result_2017_18*
-rw-rw-r-- 1 simon simon 266M Jul 4 2018 test_result_2017_1859.csv
-rw-rw-r-- 1 simon simon 265M Jul 4 2018 test_result_2017_1860.csv
-rw-rw-r-- 1 simon simon 266M Jul 4 2018 test_result_2017_1861.csv
-rw-rw-r-- 1 simon simon 266M Jul 4 2018 test_result_2017_1862.csv
-rw-rw-r-- 1 simon simon 266M Jul 4 2018 test_result_2017_1863.csv
-rw-rw-r-- 1 simon simon 265M Jul 4 2018 test_result_2017_1864.csv
-rw-rw-r-- 1 simon simon 265M Jul 4 2018 test_result_2017_1868.csv
-rw-rw-r-- 1 simon simon 265M Jul 4 2018 test_result_2017_1869.csv
-rw-rw-r-- 1 simon simon 265M Jul 4 2018 test_result_2017_1870.csv
-rw-rw-r-- 1 simon simon 266M Jul 4 2018 test_result_2017_1871.csv
-rw-rw-r-- 1 simon simon 265M Jul 4 2018 test_result_2017_1876.csv
-rw-rw-r-- 1 simon simon 265M Jul 4 2018 test_result_2017_1879.csv
simon@NUC:~/Documents/mot_data$ wc -l test_result_2017_18*
3174847 test_result_2017_1859.csv
3171747 test_result_2017_1860.csv
3175419 test_result_2017_1861.csv
3174045 test_result_2017_1862.csv
3172553 test_result_2017_1863.csv
3169572 test_result_2017_1864.csv
3169093 test_result_2017_1868.csv
3169588 test_result_2017_1869.csv
3167012 test_result_2017_1870.csv
3174846 test_result_2017_1871.csv
3168767 test_result_2017_1876.csv
3168684 test_result_2017_1879.csv
38056173 total
simon@NUC:~/Documents/mot_data$ head test_result_2017_18*
==> test_result_2017_1859.csv <==
test_id|vehicle_id|test_date|test_class_id|test_type|test_result|test_mileage|postcode_area|make|model|colour|fuel_type|cylinder_capacity|first_use_date
526927819|465437000|2017-01-02|4|RT|P|112952|GU|MAZDA|6|GREY|PE|2261|2005-01-06
1247473239|885751324|2017-01-02|4|NT|P|57263|ME|FORD|FUSION|BLACK|PE|1596|2008-02-27
351855033|1453814350|2017-01-02|4|NT|P|32882|CR|FORD|FIESTA|BLACK|PE|1388|2011-02-17
95964117|1436475652|2017-01-02|4|NT|P|195000|PR|BMW|320|BLACK|DI|1995|2000-12-01
1328291369|442557202|2017-01-02|4|NT|P|54343|CA|VOLKSWAGEN|BEETLE|BEIGE|PE|1596|2007-10-10
1072400453|502458856|2017-01-02|4|NT|PRS|62633|M|NISSAN|ALMERA|BLUE|PE|1497|2003-01-21
304727705|899608938|2017-01-02|4|NT|P|113202|B|JAGUAR|XJ|GREY|DI|2722|2006-04-25
1409109499|1202362466|2017-01-02|4|NT|F||BS|AUDI|A4|RED|DI|1968|2010-07-29
1281164041|482959208|2017-01-02|4|NT|F|37506|DE|SKODA|FABIA|BLUE|DI|1422|2008-01-29
==> test_result_2017_1860.csv <==
test_id|vehicle_id|test_date|test_class_id|test_type|test_result|test_mileage|postcode_area|make|model|colour|fuel_type|cylinder_capacity|first_use_date
271036903|568137328|2017-01-02|4|NT|P|123645|M|VOLKSWAGEN|POLO|GREY|PE|1390|2004-12-28
1631309613|120764444|2017-01-02|4|RT|P|91889|LA|TOYOTA|YARIS|GREEN|PE|998|2003-10-30
991582323|1455897531|2017-01-02|4|NT|P|44572|CV|FORD|FOCUS|SILVER|PE|998|2013-12-31
735691407|306106474|2017-01-02|4|NT|P|71490|DH|SEAT|LEON|SILVER|PE|1598|2003-05-01
1456236827|934947086|2017-01-02|4|NT|PRS|178845|B|MINI|MINI|GREEN|PE|1598|2001-10-05
560618621|319832178|2017-01-02|4|NT|PRS|96824|B|TOYOTA|AVENSIS|SILVER|DI|1995|2006-12-08
1792945873|198845134|2017-01-02|4|NT|F|100308|RG|FORD|FIESTA|GREEN|PE|1242|1999-08-26
1537054957|509803738|2017-01-02|4|NT|P|174115|N|VAUXHALL|ZAFIRA|BLUE|DI|1910|2007-02-09
769382209|421022658|2017-01-02|4|NT|ABA||OL|NISSAN|ALMERA|BLUE|PE|1497|2000-11-24
==> test_result_2017_1861.csv <==
test_id|vehicle_id|test_date|test_class_id|test_type|test_result|test_mileage|postcode_area|make|model|colour|fuel_type|cylinder_capacity|first_use_date
1664502375|1200185902|2017-01-06|4|NT|ABR|142293|BS|FORD|FIESTA|GREEN|PE|1299|2001-02-05
143974661|8412794|2017-01-03|4|NT|P|99818|LS|MERCEDES-BENZ|SPRINTER|BLACK|DI|2148|2008-04-16
1158918247|330083118|2017-01-04|4|NT|P||DT|VOLKSWAGEN|GOLF|SILVER|DI|1896|2002-10-29
119161931|1090613800|2017-01-06|4|NT|P|61056|NP|VAUXHALL|ZAFIRA|SILVER|DI|1686|2010-07-22
1260816059|470262042|2017-01-03|7|RT|P||LA|FORD|TRANSIT|WHITE|DI|1998|2004-04-06
950529483|690396331|2017-01-03|4|NT|P|17239|S|VOLKSWAGEN|UP|RED|PE|999|2013-12-11
1941596189|153542720|2017-01-03|4|NT|P|22657|CF|VAUXHALL|INSIGNIA|SILVER|PE|1796|2012-04-30
636064453|466030066|2017-01-02|4|NT|P|172237|LS|FORD|TRANSIT CONNECT|WHITE|DI|1753|2006-12-29
1980460315|164601067|2017-01-03|4|NT|PRS|30506|CO|NISSAN|JUKE|BLACK|DI|1461|2013-11-26
==> test_result_2017_1862.csv <==
test_id|vehicle_id|test_date|test_class_id|test_type|test_result|test_mileage|postcode_area|make|model|colour|fuel_type|cylinder_capacity|first_use_date
8328237|681645488|2017-01-02|4|NT|F|65814|M|CITROEN|DS3|RED|DI|1560|2011-01-26
1752437321|765728427|2017-01-02|4|NT|P|27679|L|VAUXHALL|CORSA|SILVER|PE|1398|2013-12-31
1624491863|1420528436|2017-01-02|4|NT|P|90797|N|FORD|FIESTA|BLACK|PE|1242|2006-03-31
1240655489|635421232|2017-01-02|4|NT|P|94170|NR|MITSUBISHI|L200|MAROON|DI|2477|2002-01-01
984764573|1187401766|2017-01-02|4|NT|P|30997|PO|TOYOTA|YARIS|BLACK|PE|1296|2007-10-31
217091825|151702496|2017-01-02|4|NT|P|31265|S|FORD|FIESTA|BLACK|PE|1388|2008-12-15
89146367|544640068|2017-01-02|4|NT|F|258702|BB|PEUGEOT|307|BLACK|DI|1997|2004-11-25
1833255451|204271538|2017-01-02|4|NT|P|100228|E|SEAT|AROSA|BLACK|PE|998|2002-04-13
1449419077|1461032462|2017-01-02|4|RT|P|35860|HA|ALFA ROMEO|MITO|GREY|PE|1368|2009-12-31
==> test_result_2017_1863.csv <==
test_id|vehicle_id|test_date|test_class_id|test_type|test_result|test_mileage|postcode_area|make|model|colour|fuel_type|cylinder_capacity|first_use_date
129853893|1373806199|2017-01-02|4|NT|F|11558|DY|PEUGEOT|3008|BLUE|DI|1560|2013-12-31
1873962977|1409972139|2017-01-02|4|NT|F|34223|NN|CITROEN|DISPATCH|BLACK|DI|1560|2012-12-12
1490126603|1110481264|2017-01-02|4|NT|PRS|50541|BS|MAZDA|2|BLACK|PE|1349|2009-12-21
1234235687|1152440150|2017-01-02|4|NT|PRS|23662|DN|FORD|FOCUS|SILVER|PE|1596|2012-05-29
978344771|1148953984|2017-01-02|4|NT|PRS|71688|B|TOYOTA|AYGO|RED|PE|998|2007-12-21
722453855|681423032|2017-01-02|4|RT|P|147865|E|VAUXHALL|ZAFIRA|BLACK|DI|1995|2003-09-26
466562939|133078570|2017-01-02|4|NT|PRS|92445|SE|FORD|FIESTA|BLUE|PE|1242|2004-03-31
210672023|3680216|2017-01-02|7|NT|P|188205|CR|MERCEDES-BENZ|SPRINTER|WHITE|DI|2148|2008-01-02
1954781107|1049276364|2017-01-02|4|NT|ABR||UB|TOYOTA|PRIUS|BLACK|EL|1497|2007-11-12
==> test_result_2017_1864.csv <==
test_id|vehicle_id|test_date|test_class_id|test_type|test_result|test_mileage|postcode_area|make|model|colour|fuel_type|cylinder_capacity|first_use_date
1294600567|1453814350|2017-01-02|4|NT|ABR||CR|FORD|FIESTA|BLACK|PE|1388|2011-02-17
1166655109|1139991754|2017-01-02|4|NT|P|53536|DY|BMW|118|GREY|DI|1995|2011-07-08
654873277|1232925478|2017-01-02|4|NT|PRS|35937|AL|FIAT|500|BLACK|PE|1242|2012-04-26
15145987|103343568|2017-01-02|4|NT|PRS|157111|IP|FORD|FUSION|BLUE|DI|1399|2005-05-07
1375418697|112547112|2017-01-02|4|NT|P|141892|RM|HONDA|CR-V|BLACK|DI|2204|2008-01-03
223909575|257750630|2017-01-02|4|NT|F|50937|MK|RENAULT|CLIO|SILVER|PE|1149|2008-01-10
1712127743|669189918|2017-01-02|4|NT|F|48530|WF|VOLKSWAGEN|GOLF|RED|DI|1968|2010-11-26
944454995|659540514|2017-01-02|4|NT|P|90850|B|TOYOTA|AURIS|WHITE|PE|1797|2012-01-03
1920891331|1394897952|2017-01-02|4|NT|P|61129|RG|JAGUAR|X TYPE|BLUE|DI|1988|2004-01-09
==> test_result_2017_1868.csv <==
test_id|vehicle_id|test_date|test_class_id|test_type|test_result|test_mileage|postcode_area|make|model|colour|fuel_type|cylinder_capacity|first_use_date
15543935|228810164|2017-01-02|4|NT|PRS|66744|HA|HONDA|STREAM|BLUE|PE|1998|2004-06-22
1759653019|934785018|2017-01-02|4|NT|P|82670|IG|FORD|FIESTA|RED|PE|1388|2009-11-27
1503762103|290561228|2017-01-02|4|NT|P|40320|TW|FORD|KA|BLUE|PE|1299|2006-11-01
480198439|1258278844|2017-01-02|2|NT|P|24372|BB|YAMAHA|TTR 600 RE|BLUE|PE|595|2005-03-02
1072798401|1259526146|2017-01-02|4|NT|F|174986|BD|HONDA|CIVIC|SILVER|PE|1590|2001-09-27
1537452905|446317648|2017-01-02|4|RT|P|38642|B|MERCEDES|CLC 180|BLUE|PE|1796|2008-11-26
385943783|724431334|2017-01-02|4|RT|P|78845|RH|RENAULT|MEGANE|RED|DI|1870|2008-01-24
1234434661|1163462280|2017-01-02|4|RT|P|110076|BB|VAUXHALL|CORSA|SILVER|PE|1199|2004-06-29
1443198249|550412542|2017-01-02|4|RT|P|86386|FK|VAUXHALL|COMBO|WHITE|DI|1248|2008-06-30
==> test_result_2017_1869.csv <==
test_id|vehicle_id|test_date|test_class_id|test_type|test_result|test_mileage|postcode_area|make|model|colour|fuel_type|cylinder_capacity|first_use_date
1436181525|1379300556|2017-01-02|4|NT|P|113743|M|MERCEDES-BENZ|C|SILVER|DI|2148|2008-12-22
796454235|488253688|2017-01-02|4|NT|P|95121|MK|SEAT|LEON|RED|PE|1984|2007-10-29
412617861|799495426|2017-01-02|4|NT|P|61172|NR|VAUXHALL|CORSA|BLUE|DI|1248|2009-01-19
156726945|1449047278|2017-01-02|4|NT|P|343791|DE|AUDI|A6|GREY|DI|1871|2002-09-27
1516999655|165452987|2017-01-02|4|NT|P|39811|SA|TOYOTA|AYGO|ORANGE|PE|998|2012-12-27
1133163281|558090558|2017-01-02|4|RT|P|132273|WA|VAUXHALL|VECTRA|SILVER|PE|2198|2005-12-22
237545075|974869918|2017-01-02|4|NT|P|91510|NN|BMW|118|GREY|DI|1995|2009-09-01
1294799541|709044580|2017-01-02|4|NT|PRS|67082|DG|BMW|318I SE|BLUE|PE|1995|2008-12-12
96163091|205539776|2017-01-02|4|NT|P|131730|CO|VAUXHALL|ZAFIRA|RED|DI|1995|2000-10-29
==> test_result_2017_1870.csv <==
test_id|vehicle_id|test_date|test_class_id|test_type|test_result|test_mileage|postcode_area|make|model|colour|fuel_type|cylinder_capacity|first_use_date
561016569|1001447262|2017-01-02|4|NT|F|66314|GL|VAUXHALL|CORSA|SILVER|PE|1229|2009-04-08
305125653|1303343730|2017-01-02|4|NT|P|16665|CR|NISSAN|CUBE|WHITE|PE|1598|2010-10-01
1921289279|1064420574|2017-01-02|4|RT|P|82895|YO|VAUXHALL|ZAFIRA|BLACK|PE|1598|2007-11-06
1281561989|516766956|2017-01-02|4|NT|P|88297|LE|AUDI|A5|WHITE|DI|1968|2009-12-07
130052867|132755868|2017-01-02|4|RT|P|83877|BS|HYUNDAI|I30|GREY|DI|1582|2009-05-20
2107409|160305148|2017-01-02|4|NT|P|64047|B|FORD|S-MAX|WHITE|DI|1997|2012-03-02
210870997|718533400|2017-01-02|4|NT|P|75327|UB|TOYOTA|MPV|WHITE|PE|2400|2003-12-31
547580043|526666766|2017-01-02|4|NT|P|152928|BB|VOLKSWAGEN|GOLF|SILVER|DI|1896|2002-05-16
1012234547|664721978|2017-01-02|4|NT|P|116496|CT|LEXUS|RX300|SILVER|PE|2995|2003-05-12
==> test_result_2017_1871.csv <==
test_id|vehicle_id|test_date|test_class_id|test_type|test_result|test_mileage|postcode_area|make|model|colour|fuel_type|cylinder_capacity|first_use_date
736089355|1129907754|2017-01-02|4|NT|P|43543|SL|TOYOTA|AURIS|GREY|PE|1598|2008-03-04
224307523|1199330268|2017-01-02|4|NT|PRS|152317|E|MERCEDES|C 200|SILVER|DI|2148|2009-09-01
1840471149|672782440|2017-01-02|4|RT|P|83395|CW|RENAULT|CLIO|BLUE|PE|1149|2003-11-28
1200743859|20307830|2017-01-02|4|NT|P|66715|BB|VAUXHALL|ASTRA|GOLD|PE|1364|2004-09-20
433071111|622653478|2017-01-02|4|RT|P|61814|TF|HYUNDAI|COUPE|SILVER|PE|2656|2004-06-24
769780157|152914472|2017-01-02|4|NT|F|105888|LE|VAUXHALL|ZAFIRA|SILVER|PE|1598|2002-04-18
1746216493|1470131416|2017-01-02|4|NT|PRS|136357|GL|VOLKSWAGEN|PASSAT|SILVER|DI|1896|2003-10-02
466761913|327911534|2017-01-02|4|NT|P|192214|B|BMW|730|SILVER|DI|2993|2006-05-26
803470959|147573090|2017-01-02|4|NT|P|68137|BD|MERCEDES-BENZ|CLK|SILVER|PE|3199|2005-03-17
==> test_result_2017_1876.csv <==
test_id|vehicle_id|test_date|test_class_id|test_type|test_result|test_mileage|postcode_area|make|model|colour|fuel_type|cylinder_capacity|first_use_date
35798211|1147988404|2017-01-02|4|RT|P|76611|BD|TOYOTA|STARLET|WHITE|PE|1332|1998-08-02
372507257|972820804|2017-01-02|4|RT|P|80419|PE|VAUXHALL|ASTRA|BLACK|PE|1598|2010-01-28
453325387|1305992244|2017-01-02|4|NT|P|61567|G|AUDI|A4|WHITE|DI|1968|2012-01-27
917979891|1194827588|2017-01-02|4|NT|P|65888|HA|MAZDA|5|SILVER|PE|1999|2009-12-17
1126743479|131080144|2017-01-02|4|NT|P|105119|CM|BMW|318|SILVER|PE|1995|2006-12-22
1079616151|590584028|2017-01-02|4|NT|P|65488|BN|FORD|FIESTA|BLACK|PE|1388|2011-06-09
946546149|1267220428|2017-01-04|4|NT|P|76300|NP|FORD|TRANSIT|SILVER|DI|1998|2010-01-13
776597907|1075609506|2017-01-02|4|NT|F|158994|TW|HONDA|CR-V|SILVER|PE|1998|2005-06-02
1113306953|121477085|2017-01-02|4|RT|P|17545|SO|FORD|FIESTA|GREY|PE|998|2013-12-30
==> test_result_2017_1879.csv <==
test_id|vehicle_id|test_date|test_class_id|test_type|test_result|test_mileage|postcode_area|make|model|colour|fuel_type|cylinder_capacity|first_use_date
1699089165|1286486204|2017-01-02|4|RT|P|98840|CB|FORD|MONDEO|BLACK|DI|2198|2006-06-23
1571143707|13617278|2017-01-02|4|NT|P|118231|M|RENAULT|MEGANE|BLUE|DI|1461|2004-09-08
163743669|9904378|2017-01-02|4|RT|P|38066|NW|RENAULT|CLIO|BLUE|PE|1390|2007-07-30
500452715|1397487294|2017-01-02|4|RT|P|26616|SO|FORD|FIESTA|WHITE|PE|1388|2009-12-16
709216303|918715788|2017-01-02|4|NT|P|96050|WS|VOLKSWAGEN|GOLF|GREY|PE|1390|2007-05-04
1045925349|1337436588|2017-01-02|4|NT|P|98042|PO|FORD|GRAND C-MAX|GREY|DI|1560|2011-01-31
1254688937|1410604904|2017-01-02|4|NT|PRS|67347|PR|CITROEN|DS3|GREY|DI|1560|2011-09-28
1335507067|875005856|2017-01-02|4|NT|P|70755|B|HONDA|CIVIC|GREY|PE|1339|2006-06-13
951670693|962559766|2017-01-02|4|NT|P|62915|HA|BMW|535|BLACK|DI|2993|2011-03-18
3.1GB and 38 million rows and heads for files looks look. We’re back to | as the delimiter now.
2016 #
From 2016 and before gzip was used for file compression…
simon@NUC:~/Documents/mot_data$ gunzip test_result_2016.txt.gz
simon@NUC:~/Documents/mot_data$ ls -lh test_result_2016.txt
-rw-rw-r-- 1 simon simon 3.1G Jan 14 15:46 test_result_2016.txt
simon@NUC:~/Documents/mot_data$ wc -l test_result_2016.txt
37693381 test_result_2016.txt
simon@NUC:~/Documents/mot_data$ head test_result_2016.txt
test_id|vehicle_id|test_date|test_class_id|test_type|test_result|test_mileage|postcode_area|make|model|colour|fuel_type|cylinder_capacity|first_use_date
1645480751|1374211238|2016-01-01|4|NT|P|117033|SM|VOLKSWAGEN|POLO|BLACK|PE|1600|2000-06-23
1393462389|1153769898|2016-01-01|4|NT|P|99292|NE|VOLKSWAGEN|PASSAT|BLUE|DI|1968|2006-11-30
1863202023|1485039300|2016-01-01|7|NT|PRS|170320|E|MERCEDES|SPRINTER 313 CDI LWB|WHITE|DI|2148|2005-01-14
1304292863|1097073904|2016-01-01|4|RT|P|70623|NN|MINI|MINI|GREY|PE|1598|2004-04-08
845810407|1166548800|2016-01-01|4|NT|P|21567|DL|NISSAN|JUKE|BLUE|PE|1612|2011-07-07
1474886807|1397571962|2016-01-01|4|NT|P|62207|CR|SUZUKI|WAGON-R+|SILVER|PE|1328|2005-02-09
1560183779|798718734|2016-01-01|4|NT|P|44855|PE|FIAT|PUNTO ACTIVE 8V|BLUE|PE|1242|2005-12-19
1314962575|631812270|2016-01-01|4|NT|P|163692|NE|RENAULT|TRAFIC|BLUE|DI|1870|2003-03-28
1517535293|244951922|2016-01-01|4|NT|P|41057|BS|DAEWOO|MATIZ|SILVER|PE|796|2003-09-01
Single file of 3.1G with 37.6 million rows and the head of the file looks good.
2015 #
simon@NUC:~/Documents/mot_data$ gunzip test_result_2015.txt.gz
simon@NUC:~/Documents/mot_data$ ls -lh test_result_2015.txt
-rw-rw-r-- 1 simon simon 3.1G Jan 14 15:46 test_result_2015.txt
simon@NUC:~/Documents/mot_data$ wc -l test_result_2015.txt
37490737 test_result_2015.txt
simon@NUC:~/Documents/mot_data$ head test_result_2015.txt
test_id|vehicle_id|test_date|test_class_id|test_type|test_result|test_mileage|postcode_area|make|model|colour|fuel_type|cylinder_capacity|first_use_date
1380469872|1023201026|2015-01-01|1|NT|ABR||OX|HONDA|XL125V|SILVER|PE|125|2003-06-06
1844368232|170614898|2015-01-01|1|NT|ABR||WD|AJS|DD 125 E 08|BLACK|PE|125|2009-12-21
938902988|980532916|2015-01-01|4|NT|ABR||KT|SUBARU|IMPREZA|SILVER|PE|2457|2006-05-22
1989308134|1258278786|2015-01-01|4|RT|P|107604|SN|LAND ROVER|DISCOVERY|GREEN|DI|2720|2005-01-18
1100975468|343484998|2015-01-01|4|RT|P|79858|CV|RENAULT|SCENIC|BLUE|DI|1461|2004-09-17
511507734|636234930|2015-01-01|4|NT|P|91006|CM|ALFA ROMEO|GT|SILVER|PE|1970|2004-12-09
1625020172|1051800982|2015-01-01|4|NT|P|109353|DH|ISUZU|RODEO|BLACK|DI|2999|2003-12-02
18614018|1429705738|2015-01-01|4|NT|P|109074|BB|VAUXHALL|ZAFIRA|BLACK|DI|1995|2004-07-19
1872117038|1115925744|2015-01-01|4|RT|P|154396|CB|BMW|X5|SILVER|DI|2926|2002-09-19
Very similar stats to 2016, 3.1GB 37.5 million rows and file contents looks good.
2014 #
simon@NUC:~/Documents/mot_data$ gunzip test_result_2014.txt.gz
simon@NUC:~/Documents/mot_data$ ls -lh test_result_2014.txt
-rw-rw-r-- 1 simon simon 3.1G Jan 14 15:46 test_result_2014.txt
simon@NUC:~/Documents/mot_data$ wc -l test_result_2014.txt
37493826 test_result_2014.txt
simon@NUC:~/Documents/mot_data$ head test_result_2014.txt
test_id|vehicle_id|test_date|test_class_id|test_type|test_result|test_mileage|postcode_area|make|model|colour|fuel_type|cylinder_capacity|first_use_date
75849048|98956394|2014-01-01|4|RT|P|71002|BD|FORD|FOCUS|SILVER|PE|1596|2003-12-01
1893007630|479820560|2014-01-01|4|RT|P|64836|HU|VAUXHALL|CORSA|GREEN|PE|973|1998-12-31
1786152630|659469072|2014-01-01|4|NT|P|143739|SL|LAND ROVER|DISCOVERY|BLUE|DI|2495|1996-08-06
888025356|951392102|2014-01-01|4|NT|F|62949|LE|DAEWOO|MATIZ|GREEN|PE|796|2002-07-22
159215682|1230516026|2014-01-01|4|NT|P|94954|UB|BMW|318|GREY|PE|1995|2006-07-01
640798820|759806792|2014-01-01|4|NT|P|82823|SL|VOLKSWAGEN|POLO|GREEN|PE|1390|1999-05-05
866472130|416537576|2014-01-01|4|RT|P|100789|BB|FORD|FOCUS|SILVER|PE|1796|2001-05-01
548721000|480095260|2014-01-01|4|NT|F|94968|GL|ISUZU|TF|GREEN|DI|2499|2003-07-16
1985242428|1364748290|2014-01-01|4|NT|P|43526|WS|VAUXHALL|ASTRA|BLACK|PE|1796|2010-01-29
3.1GB with 37.5 million rows and file contents looking good.
2013 #
simon@NUC:~/Documents/mot_data$ gunzip test_result_2013.txt.gz
simon@NUC:~/Documents/mot_data$ ls -lh test_result_2013.txt
-rw-rw-r-- 1 simon simon 3.1G Jan 14 15:46 test_result_2013.txt
simon@NUC:~/Documents/mot_data$ wc -l test_result_2013.txt
37361926 test_result_2013.txt
simon@NUC:~/Documents/mot_data$ head test_result_2013.txt
test_id|vehicle_id|test_date|test_class_id|test_type|test_result|test_mileage|postcode_area|make|model|colour|fuel_type|cylinder_capacity|first_use_date
608789348|738799388|2013-01-01|4|NT|F|104217|WF|HYUNDAI|GETZ|SILVER|DI|1493|2006-08-17
1243374888|1284015018|2013-01-01|4|NT|PRS|48807|WF|FORD|FOCUS|BLUE|PE|1596|2008-11-26
1382967316|1025199626|2013-01-01|4|RT|P|85159|S|FORD|TRANSIT CONNECT|WHITE|DI|1753|2006-01-26
566374210|159460880|2013-01-01|4|NT|P|138483|WF|VAUXHALL|MOVANO|WHITE|DI|2463|2007-12-17
496179986|366607340|2013-01-01|4|NT|F|76427|B|SEAT|AROSA|SILVER|PE|998|2002-11-07
502846824|1359633208|2013-01-01|4|NT|F|100982|B|VAUXHALL|ASTRA|BLACK|PE|1598|2005-03-01
1199477564|1184417722|2013-01-01|4|NT|F|76525|NP|FORD|FOCUS|BLUE|DI|1753|2003-10-10
1794049122|1125863740|2013-01-01|4|NT|P|127122|BB|CHRYSLER|VOYAGER|PURPLE|DI|2776|1999-09-13
1489198368|45724212|2013-01-01|4|NT|P|45651|HX|TOYOTA|COROLLA VERSO|RED|PE|1794|2006-01-16
3.1GB with 37.3 million rows and file contents looking good.
2012 #
simon@NUC:~/Documents/mot_data$ gunzip test_result_2012.txt.gz
simon@NUC:~/Documents/mot_data$ ls -lh test_result_2012.txt
-rw-rw-r-- 1 simon simon 3.0G Jan 14 15:46 test_result_2012.txt
simon@NUC:~/Documents/mot_data$ wc -l test_result_2012.txt
36846343 test_result_2012.txt
simon@NUC:~/Documents/mot_data$ head test_result_2012.txt
test_id|vehicle_id|test_date|test_class_id|test_type|test_result|test_mileage|postcode_area|make|model|colour|fuel_type|cylinder_capacity|first_use_date
453094722|1336915420|2012-01-01|4|NT|F|89540|S|DAEWOO|MATIZ|GREEN|PE|796|2002-03-22
1987609170|496266720|2012-01-01|4|RT|P|16380|S|PEUGEOT|306|YELLOW|DI|1905|1999-08-09
903083256|118415214|2012-01-01|4|NT|P|96341|WF|VAUXHALL|MOVANO|WHITE|DI|2464|2008-01-03
1112913888|763268186|2012-01-01|4|NT|P|48571|WF|VAUXHALL|ASTRA|WHITE|DI|1248|2008-12-14
1559139028|1432194588|2012-01-01|4|RT|P|111319|PE|AUDI|A2|BLUE|PE|1390|2001-11-16
782610276|955517208|2012-01-01|4|NT|PRS|75455|HU|NISSAN|MICRA|WHITE|PE|998|1995-01-31
1754413996|455754094|2012-01-01|4|NT|F|56120|WF|VAUXHALL|ASTRA|WHITE|DI|1248|2008-12-14
1067194566|1071396330|2012-01-01|4|RT|P|151189|SN|SUZUKI|VITARA|BLACK|DI|1590|1997-08-01
1941098522|551536520|2012-01-01|4|NT|P|79042|CW|PEUGEOT|307|GREY|DI|1560|2006-09-26
3.0GB with 36.8 million rows and file contents looking good.
2011 #
simon@NUC:~/Documents/mot_data$ gunzip test_result_2011.txt.gz
simon@NUC:~/Documents/mot_data$ ls -lh test_result_2011.txt
-rw-rw-r-- 1 simon simon 3.0G Jan 14 15:46 test_result_2011.txt
simon@NUC:~/Documents/mot_data$ wc -l test_result_2011.txt
36849155 test_result_2011.txt
simon@NUC:~/Documents/mot_data$ head test_result_2011.txt
test_id|vehicle_id|test_date|test_class_id|test_type|test_result|test_mileage|postcode_area|make|model|colour|fuel_type|cylinder_capacity|first_use_date
539690916|244877620|2011-01-01|4|NT|P|54127|NR|VOLKSWAGEN|POLO|SILVER|PE|1390|2003-01-03
201563620|1094583990|2011-01-01|4|NT|PRS|63997|NR|FORD|FOCUS|BLACK|PE|1988|2002-07-22
1208708976|135050588|2011-01-01|4|NT|F|83946|SA|FORD|FIESTA|BLUE|PE|1299|1998-12-31
1898167378|507467582|2011-01-01|4|NT|P|45917|NP|MG|B GT|WHITE|PE|1798|1971-07-01
367395770|1280070280|2011-01-01|4|NT|P|229153|LE|BMW|525|GREEN|DI|2498|1996-01-30
732978810|936792652|2011-01-01|4|RT|P|122252|SS|AUDI|A3|BLACK|DI|1968|2004-10-25
379165572|302739506|2011-01-01|4|NT|P|156126|WR|VOLVO|V40|SILVER|DI|1870|2002-03-28
1532977216|1490017506|2011-01-01|4|RT|P|58621|RG|LAND ROVER|DEFENDER 110|GREEN|DI|2495|1995-08-01
949214096|63021908|2011-01-01|4|NT|P|84862|N|MERCEDES|C 180K|BLUE|PE|1796|2003-12-22
3.0GB with 36.8 million rows and data looking good.
2010 #
simon@NUC:~/Documents/mot_data$ gunzip test_result_2010.txt.gz
simon@NUC:~/Documents/mot_data$ ls -lh test_result_2010.txt
-rw-rw-r-- 1 simon simon 3.0G Jan 14 15:46 test_result_2010.txt
simon@NUC:~/Documents/mot_data$ wc -l test_result_2010.txt
36134921 test_result_2010.txt
simon@NUC:~/Documents/mot_data$ head test_result_2010.txt
test_id|vehicle_id|test_date|test_class_id|test_type|test_result|test_mileage|postcode_area|make|model|colour|fuel_type|cylinder_capacity|first_use_date
806196694|188581750|2010-01-01|7|NT|ABR||CH|FORD|TRANSIT|WHITE|DI|2402|2003-01-15
1471475604|753317992|2010-01-01|4|NT|F|84722|BD|VAUXHALL|ZAFIRA|BLACK|PE|1796|2001-10-18
1392494248|272470820|2010-01-01|4|NT|P|10114|DE|FIAT|DUCATO|WHITE|DI|2800|2000-11-17
1735013826|1366770062|2010-01-01|4|NT|PRS|78368|BD|TOYOTA|COROLLA|WHITE|PE|1332|1995-08-31
1629073590|49902474|2010-01-01|4|NT|PRS|51000|BL|PEUGEOT|307|SILVER|PE|1360|2003-07-17
767336288|673773234|2010-01-01|4|NT|P|153356|BD|PEUGEOT|307|BLUE|DI|1997|2003-12-06
1873135550|257169262|2010-01-01|4|RT|P|79294|BD|FORD|FOCUS|BLUE|PE|1596|1999-11-24
1559183164|976959056|2010-01-01|4|NT|P|137263|WV|TOYOTA|LUCIDA|BLUE|DI|2184|1997-01-01
1890031062|967342532|2010-01-01|4|NT|F|133476|W|TOYOTA|ESTIMA 2WD AUTO|SILVER|DI|2180|1996-12-31
3.0GB with 36.1 million rows and file contents looks good.
2009 #
simon@NUC:~/Documents/mot_data$ gunzip test_result_2009.txt.gz
simon@NUC:~/Documents/mot_data$ ls -lh test_result_2009.txt
-rw-rw-r-- 1 simon simon 2.9G Jan 14 15:45 test_result_2009.txt
simon@NUC:~/Documents/mot_data$ wc -l test_result_2009.txt
35436944 test_result_2009.txt
simon@NUC:~/Documents/mot_data$ head test_result_2009.txt
test_id|vehicle_id|test_date|test_class_id|test_type|test_result|test_mileage|postcode_area|make|model|colour|fuel_type|cylinder_capacity|first_use_date
1379380200|1336022860|2009-01-01|4|NT|ABR||DE|MERCEDES|C 250|BLUE|DI|2497|1997-02-26
717802790|304757246|2009-01-01|4|NT|P|49013|BD|ROVER|25 IMPRESSION S|BLUE|PE|1396|2001-03-31
1792798112|466136480|2009-01-01|4|NT|P|196306|DE|RENAULT|UNCLASSIFIED|WHITE|DI|2463|2004-04-13
1082280248|1267482366|2009-01-01|4|NT|F|111686|BB|ROVER|416|RED|PE|1589|1999-12-31
1366813380|64410314|2009-01-01|4|RT|P|100460|IP|RENAULT|MEGANE SCENIC|RED|PE|1598|1998-05-27
1275514134|1452931758|2009-01-01|4|NT|P|88176|CH|RENAULT|MEGANE|BLUE|DI|1461|2005-01-05
1461097168|343630626|2009-01-01|4|RT|P|76606|HU|AUDI|TT|SILVER|PE|1781|2002-05-13
295185618|149270444|2009-01-01|4|NT|P|56610|W|VAUXHALL|ASTRA|GREY|PE|1389|2002-09-04
1761887058|704695744|2009-01-01|4|RT|P|68062|GU|AUDI|A2|GREY|DI|1422|2002-01-14
2.9GB with 35.4 million rows and file contents looks good.
2008 #
simon@NUC:~/Documents/mot_data$ gunzip test_result_2008.txt.gz
simon@NUC:~/Documents/mot_data$ ls -lh test_result_2008.txt
-rw-rw-r-- 1 simon simon 2.8G Jan 14 15:45 test_result_2008.txt
simon@NUC:~/Documents/mot_data$ wc -l test_result_2008.txt
34439133 test_result_2008.txt
simon@NUC:~/Documents/mot_data$ head test_result_2008.txt
test_id|vehicle_id|test_date|test_class_id|test_type|test_result|test_mileage|postcode_area|make|model|colour|fuel_type|cylinder_capacity|first_use_date
317588902|124592814|2008-01-01|4|NT|P|189567|NE|RENAULT|ESPACE|BLUE|PE|1995|1993-01-01
1385282064|1165558462|2008-01-01|4|NT|F|116718|BB|TOYOTA|CARINA E|BLUE|PE|1587|1994-04-26
1425711724|587883270|2008-01-01|4|RT|P|101904|BB|NISSAN|ALMERA|GREEN|PE|1392|1997-12-10
594025786|687800162|2008-01-01|4|NT|ABA||B|HONDA|CIVIC|BLUE|PE|1590|2005-02-24
1741336990|641514360|2008-01-01|2|NT|P|31926|TA|MATCHLESS|G3LS|BLACK|PE|347|1958-07-08
1377218714|1203478532|2008-01-01|4|NT|P|39225|BD|PEUGEOT|306|SILVER|PE|1761|2001-03-01
681413512|1296120616|2008-01-01|4|RT|P|135651|HX|MITSUBISHI|SHOGUN|SILVER|DI|2835|1992-10-09
1771273408|399202526|2008-01-01|4|NT|PRS|112790|BS|RENAULT|CLIO|RED|PE|1390|1996-07-23
681215928|1414798484|2008-01-01|4|NT|F|75014|BB|HONDA|CIVIC|BLUE|PE|1396|1998-02-02
2.8GB with 34.4 million rows and file contents looks good.
2007 #
simon@NUC:~/Documents/mot_data$ gunzip test_result_2007.txt.gz
simon@NUC:~/Documents/mot_data$ ls -lh test_result_2007.txt
-rw-rw-r-- 1 simon simon 2.8G Jan 14 15:45 test_result_2007.txt
simon@NUC:~/Documents/mot_data$ wc -l test_result_2007.txt
33591239 test_result_2007.txt
simon@NUC:~/Documents/mot_data$ head test_result_2007.txt
test_id|vehicle_id|test_date|test_class_id|test_type|test_result|test_mileage|postcode_area|make|model|colour|fuel_type|cylinder_capacity|first_use_date
808298134|151699072|2007-01-01|4|NT|ABR||SK|FORD|MAVERICK|GREEN|PE|1988|2002-05-02
842444180|1291028996|2007-01-01|4|RT|P|97109|HU|VAUXHALL|ASTRA|WHITE|DI|1700|1999-03-31
348649550|174602976|2007-01-01|4|NT|PRS|28389|M|VAUXHALL|CAVALIER L|GOLD|PE|1598|1992-01-24
1828509444|369109734|2007-01-01|4|NT|P|82088|E|RENAULT|LAGUNA|SILVER|PE|1998|1996-03-29
1697977864|800848976|2007-01-01|4|NT|F|96285|EX|VAUXHALL|VECTRA|SILVER|PE|1799|1999-03-29
966004542|378981876|2007-01-01|4|RT|P|110393|BD|FORD|GALAXY ZETEC TDI|SILVER|DI|1896|2001-12-10
1307568758|1479243936|2007-01-01|4|NT|P|113044|WF|VAUXHALL|ASTRA|BLUE|PE|1389|1993-04-06
1988447032|1223225444|2007-01-01|4|NT|P|101674|BB|HONDA|ACCORD|GREEN|PE|1850|1998-11-24
1123460628|429612316|2007-01-01|4|NT|PRS|78053|BB|HYUNDAI|ACCENT|WHITE|PE|1341|2000-01-15
2.8GB and 33.6 million rows with file content looking good.
2006 #
simon@NUC:~/Documents/mot_data$ gunzip test_result_2006.txt.gz
simon@NUC:~/Documents/mot_data$ ls -lh test_result_2006.txt
-rw-rw-r-- 1 simon simon 2.6G Jan 14 15:44 test_result_2006.txt
simon@NUC:~/Documents/mot_data$ wc -l test_result_2006.txt
32014081 test_result_2006.txt
simon@NUC:~/Documents/mot_data$ head test_result_2006.txt
test_id|vehicle_id|test_date|test_class_id|test_type|test_result|test_mileage|postcode_area|make|model|colour|fuel_type|cylinder_capacity|first_use_date
1949156228|816235882|2006-01-01|4|NT|P|54101|SA|AUSTIN|MINI MAYFAIR|GREEN|PE|998|1985-05-29
1231051602|1038868250|2006-01-01|4|NT|P|134032|CW|MITSUBISHI|SHOGUN|WHITE|DI|2477|1989-12-31
1976279366|416238784|2006-01-01|0|NT|P|100087|BD|VAUXHALL|UNCLASSIFIED|BLUE|PE||
737220590|967257220|2006-01-01|4|NT|P|95802|SE|FORD|ESCORT|RED|PE|1597|1999-04-30
1564944812|360370778|2006-01-01|4|NT|P|18325|PO|KIA|CARENS|BLUE|PE|1793|2000-09-27
1102734306|270469542|2006-01-01|4|NT|P|27325|NE|RENAULT|CLIO|GREEN|PE|1149|2001-02-20
655291568|957107084|2006-01-01|4|RT|P||HR|CITROEN|SAXO FORTE|ORANGE|PE|1124|2000-04-27
1252564280|1020366628|2006-01-01|4|RT|P|15621|E|NISSAN|MICRA GX AUTO|WHITE|PE|1275|1997-05-29
1232280538|46559388|2006-01-01|4|NT|P|51937|E|NISSAN|UNCLASSIFIED|GREEN|PE|1392|1997-12-11
2.6GB with 32 million rows and head of the file looks good.
2005 #
simon@NUC:~/Documents/mot_data$ gunzip test_result_2005.txt.gz
simon@NUC:~/Documents/mot_data$ ls -lh test_result_2005.txt
-rw-rw-r-- 1 simon simon 621M Jan 14 15:29 test_result_2005.txt
simon@NUC:~/Documents/mot_data$ wc -l test_result_2005.txt
7499745 test_result_2005.txt
simon@NUC:~/Documents/mot_data$ head test_result_2005.txt
test_id|vehicle_id|test_date|test_class_id|test_type|test_result|test_mileage|postcode_area|make|model|colour|fuel_type|cylinder_capacity|first_use_date
804664368|256274986|2005-01-01|0|NT|P|23459|TF|FORD|UNCLASSIFIED|SILVER|PE||
392603376|633988704|2005-01-01|0|NT|P|40961|E|LOTUS|UNCLASSIFIED|RED|PE||
1894843206|1320781748|2005-01-01|0|NT|P|16416|S|VAUXHALL|UNCLASSIFIED|BLUE|PE||
830908928|1263031090|2005-01-01|4|NT|P|93318|W|LAND ROVER|109 V8 S.W.|BLUE|PE|3528|1981-04-06
727535460|1123257842|2005-01-01|4|NT|P|121930|RG|CITROEN|AX|WHITE|DI|1360|1993-08-31
207507680|1168225356|2005-01-01|0|NT|P|122296|FK|CHRYSLER|UNCLASSIFIED|BLACK|DI||
932135720|215535474|2005-01-01|4|NT|P|74823|DG|VAUXHALL|VECTRA|BLUE|PE|1598|1996-10-21
1932156144|1100578334|2005-01-01|4|NT|P|63133|SY|VAUXHALL|CORSA|GREY|PE|1389|1997-03-28
1416289564|1239943850|2005-01-01|4|NT|P|73256|SN|FORD|FIESTA|SILVER|PE|1119|1994-05-25
621MB with 7.4 million rows.
The low number of rows in 2005 is detailed in the accompanying guide to teh data stating…
Computerisation was not fully implemented across Great Britain until 01/04/2006, therefore the dataset will not contain all tests performed between 01/01/2005 and 31/03/2006 The data encompasses all tests for which a valid MOT pass could have been a potential outcome.
Next Step #
This data set all looks good. Next step is to load into a DuckDB database. Check out the next post in this project for this.